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INTRODUCTION 


Past relationships between statistics and 
the clinic have been, to say the ieast, 
strained if not antagonistic. Although 
we have seen the development of psycho- 
logical statistics, agricultural _ statistics, 
medical and even geological statistics, 
clinical statistics is still an unheard-of 
combination. Apparently the bedside 
needs of the patient are either too urgent 
or too little understood to permit the ob- 
jective evaluation of the concepts involved 
in clinical work. Nevertheless, because of 
the rise of interest in clinical work since 
World War II, and because some of the 
men involved in clinical work had pre- 
viously been exposed to statistical think- 
ing, a demand has arisen to bridge the 
gap between these two disciplines. The 
clinician on his part has begun to feel the 
need for objectifying some of his em- 
pirically gained intuitions, and the statis- 
tician on his part has begun to wonder 
whether his present day tools are ade- 
quate to handle the complexity of the 
clinical case. 

In analyzing the relationship between 
these two areas of research, it might be 
well to review the gradual, perhaps unno- 
ticed rapprochement that has occurred 
between them. The best indications of 
this process can be noted in the two re- 
cent reviews of the development of sta- 
tistics. Fisher“ in his address at the 


inaugural meeting of the British Region 
of the Biometric Society gave a kaleido- 
scopic review of the advances of science 
from the early days of the Greeks to the 
present. Science got its early start as a 
deductive process through thé invention 
of geometry, when ‘“‘men learned to rea- 
son deductively, from well defined ab- 
stract concepts, to cogent and irrefragable 
conclusions.” The purpose of Euclid’s 
Geometry however, was not aimed at 
artistically unified presentations alone, 
but had as its initial inspiration as well as 
its final goal practical applications in sur- 
veying, architecture, space description 
and measurement. Ultimately this precise 
deductive thinking gave rise to the branch 
of logic known as noumenal or philo- 
sophical deduction which, because of its 
earlier development completely swamped 
its tender sister—scientific or phenomenal 
inductive logic. To Galton, influenced by 
his half-cousin Darwin, is attributed the 
rise of scientific (inductive) logic, espe- 
cially in its statistical applications. As 
soon as modern man left his philosophical- 
ly (deductively) organized world, trans- 
mitted to him from the ancient Greeks 
through the middle ages, and began to 
make observation on nature both animate 
and inanimate, the limitations of deduc- 
tive thinking unsupported by its induc- 
tive counterpart became readily apparent. 
Faced with the universal variability in all 
biological and social data, Galton and his 








2 JOSEPH 


generation were completely stymied in 
their attempt to deal with these phe- 
nomena mathematically. There were no 
axioms to begin with because even the 
simplest truth about biology seemed to be 
subject to exceptions. Variability was 
the most characteristic event in nature. 
Galton accepted the challenge and made a 
virtue of variability by analyzing vari- 
ability itself to see its consistencies as 
well as its variation. This gave the im- 
petus to the present almost universal use 
of statistical methods in the inductive sci- 
ences of today. 

Weaver’? in a more detailed analysis 
of present day science, points out that 
scientific method has undergone three 
phases best characterized by the types of 
problems it dealt with: (1) problems of 
simplicity, (2) problems of disorganized 
complexity, and (3) problems of organ- 
ized complexity. Up to 1900 the physi- 
cal sciences had dealt with rather simple 
problems in which all but two variables, 
an independent and a dependent variable, 
could be kept constant. During the same 
period, the life sciences had not reached 


even this simple level of development and 
had to satisfy themselves with “collection, 
description, classification and the observa- 
tion of concurrent and apparently corre- 


lated effects.” Only the sketchiest begin- 
nings of quantitative theories were pro- 
posed, and most of the data were qualita- 
tive rather than quantitative. 

The next step in scientific development 
was the removal of the two variable con- 
straint and the introduction of the multi- 
variable problem in which the single in- 
dividual or molecule is lost sight of, and 
only the net effect of millions of mole- 
cules, each free to move in its own way 
is measured in terms of such overall va- 
riables as production of heat, etc. In the 
life sciences, social statistics, morbidity 
and mortality statistics were developed to 
predict the general trend in a group; the 
single individual being totally lost sight 
of as an entity. The development of 
probability theory gave an impetus to 
these social and physical applications and 
contributed much to the establishment of 
generalizations that have proved useful in 
the physical as well as the life sciences. 

The final step was taken rather recently 
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in the recognition of problems of organ- 
ized complexity. This type of problem 
lies midway between the problems of sim- 
plicity of the two variable type and its op- 
posite, the problems of disorganized com- 
plexity with its multitude of variables. 
The number of variables is still large, 
but they do not approach infinity. Fur- 
thermore, they are all problems which in- 
volve dealing simultaneously with a size- 
able number of factors which are inter- 
related in an organic whole. Despite the 
danger of misinterpreting Weaver's in- 
tent, I have interpreted this type of prob- 
lem as one in which our concern is with 
single individuals or groups of like- 
minded individuals. We are not con- 
cerned with the general laws applying to 
gases, but with why a given isotope in the 
gas chamber behaves differently from its 
neighbors. We are not so much con- 
cerned with the prediction of say an elec- 
tion, but with why John Jones votes the 
way he does. 

This brings the problem of the indi- 
vidual to the fore. We may liken the 
shift from problems of disorganized to 
those of organized complexity, to the 
shift in interest from the behavior of 
gases to the behavior of the subatomic 
or intra-atomic world. Off-hand, there is 
no reason why the laws developed for the 
extra-atomic world should not hold true 
for the intro-atomic world, and many of 
them do, but they were sufficiently differ- 
ent to give rise to new concepts in physics, 
e.g., quanta. The likelihood is that new 
concepts must also arise in the life-sci- 
ences when we begin to deal with the so- 
cial atom, the single individual. 

In inaugurating this symposium, the 
starting point was the third level of de- 
velopment, namely organized complexity. 
To be sure, clinical psychology has hard- 
ly passed the first level of development, 
namely, that of collecting and classifying 
observations, but the second level of de- 
velopment, that of disorganized com- 
plexity is uniquely unsuitable for the 
treatment of clinical data, While indi- 
vidual differences and generalizations 
about average performance have proved 
useful in social, general, and experimental 
psychology, clinical psychology eschews 
these approaches and is concerned more 
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with the deviations in performance which 
characterize a given individual. For this 
reason, perhaps, the clinical sciences may 
be able to dispense with the intermediate 
stage and proceed directly to the study of 
organised complexity. 

Present day statistical treatment of 
clinical data is primarily group-centered 
rather than individual-centered. The clini- 
cian who attempts to apply statistics in 
the classification of his cases or the evalu- 
ation of his results has to adapt the group- 
centered methods to the individual-cen- 
tered material. Many such adaptations 
have already been suggested and utilized 
in various clinics and clinical research pro- 
grams. But no concerted effort has ever 
been made to bring these methods to- 
gether. The purpose of this symposium 
is to collect the adaptations of group sta- 
tistics and provide the clinician with ex- 
amples of their application to his prob- 
lem. The following types of problems 


are frequently encountered by clinicians: 
(1) integration of test or observational 
data into patterns or profiles, (2) deter- 
mining whether a given change in per- 


formance or behavior is significant or due 
to chance, and (3) determining the de- 
gree of consistency or variability or scat- 
ter that a given performance exhibits. 
There are numerous more specific prob- 
lems, but most of them could be classified 
into the above categories. 

Regarding the first type of problems, 
that of patterns or profiles for the descrip- 
tion or classification of a series of obser- 
vations or test scores on a given indi- 
vidual, several methods have been pro- 
posed. Stephenson™® describes an ap- 
plication of factor analysis to this prob- 
lem. Cronbach®) describes a method 
which might be designated as a pattern 
analysis approach. In general, the prob- 
lem seems to reduce itself to finding in- 
dividual similarities between the various 
cases under study and classifying those 
cases who show the greatest similarity in- 
to one group or type. A type of this va- 
riety might be defined as a group of in- 
dividuals who exhibit a common pattern 
in their scores or behavior, and whose 
frequency is such that it is greater than 
chance expectancy '?), 

The second problem, of determining 
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the significance of an observed change in 
a single individual is one which has re- 
ceived attention only recently 9, 


Basic ASSUMPTIONS 


In the usual study of normal indi- 
viduals with psychological tests, once the 
universe under investigation is specified, 
a random sample can be drawn and esti- 
mates of the parameters characterizing 
the universe can be obtained to any de- 
gree of accuracy desired. The changes 
in score of the selected sample after a 
certain period of time, or after the ap- 
plication of the experimental factor, can 
also be compared to the expected chance 
or systematic variation in a control group 
selected from the specified universe to see 
whether the change is significant. Cer- 
tain assumptions underlie this approach. 
First, that the score obtained for a given 
individual is a random sample of his true 
score distribution. Second, that the score 
variability (variation around the true 
value of the score) is constant for each 
individual in the sample. The variability 
of the entire sample can therefore be re- 
garded as a basis for estimating the vari- 
bility of each individual. In the case of 
abnormal individuals, several important 
requirements that are prerequisites for 
the above mentioned treatment are lack- 
ing. The universe from which the sam- 
ple is drawn can not always be accurately 
specified, the usual standard error for 
evaluating changes is untrustworthy, and 
the whole method of ordinary group sta- 
tistics fails because its assumptions are 
not fulfilled. Consequently a modifica- 
tion or a new approach is required. In 
order to develop this new approach cer- 
tain axioms and postulates are necessary : 


1. In the study of a single individual, 
especially of a so-called abnormal in- 
dividual, we must treat each case as 
an independent universe. Later when 
the characteristics of each of these 
universes become known we may be 
able to classify them into groups of 
like structured or similar universes. 
Until such knowledge becomes avail- 
able, it is unwarranted to classify in- 
dividuals as equivalent even if they 
have made identical scores on a 
series of tests. 
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2. Every individual is characterized by 
a given level of performance, of 
which the observed test score is a 
random sample. 

Every individual is also character- 
ized by a given degree of variability 
around the level of performance. 
This var’ ility is characteristic of 
the individual and varies as much or 
more from person to person as does 
the level around which this varia- 
tion occurs. This variability or its 
opposite, consistency, may be likened 
to the physiological consistency which 
goes under the name of homeostasis. 
The behavioral field as well as the 
internal environment of the indi- 
vidual is subject to the influence of 
slight alterations in the stimulation 
of the organism internally or extern- 
ally, to which it responds by changes 
in performance, but this change in 
performance follows a characteristic 
pattern dependent upon the indi- 
vidual’s characteristic variability or 
homeostatic pattern. 

The effect of change in stimulation, 
internal and external, is to bring 
about an alteration either in the level 
of performance, the variation in per- 
formance, or in both. 


These axioms constitute the basis on 
which treatment of an individual’s data 
differ from the treatment of group data. 
Essentially the differences stem from the 
fact that the clinician is unwilling to re- 
gard his cases as constituting a meaning- 
ful universe. Consequently he must re- 
sort to considering each case as a universe 
unto itself. 

The consequences of these axioms are 
significant for the treatment of clinical 
data as follows: 


1. In addition to obtaining a sample of 
a given individual’s performance, it 
is necessary also to obtain a sample 


of his variability. This necessitates 
taking 4+ or more readings of obser- 
vations on each case under study, 
for each of the states that is under 
investigation. 

It is necessary to determine the types 
of influences exerted by variables 
other than those which the clinician 


may be attempting to vary. As a 
result of our dependence on repeated 
measurements or observations, prac- 
tice effects are bound to occur. Since 
these practice effects are probably 
uniquely determined, it becomes im- 
possible to evaluate those effects on 
a group basis, and methods for de- 
termining practice effects must be 
invented for the single individual. 


In addition to the experimental variable 
under investigation at least two addi- 
tional variables must be investigated. 
Some patients tend to improve (or get 
worse) spontaneously, regardless of the 
variables manipulated by the experiment- 
ally-minded clinician. The mere atten- 
tion paid to the patient who may have 
previously felt hopeless or neglected, is 
sufficient to alter his performance tem- 
porarily or permanently, e.g., “total 
push” effects. Care must be taken not 
to attrikute changes produced by the 
“total push” to the workings of the ex- 
perimental variable. In addition to the 
externally apparent changes which may 
affect the individual, certain less appar- 
ent changes may be at work such as mood 
swings, cooperativeness, level of motiva- 
tion, etc. 


CURRENTLY AVAILABLE METHODS 


It is needless to say that answers to 
all of these problems are not yet avail- 
able. But the description of the prob- 
lems facing the study of the individual 
ought to provide the clinical research 
worker with a program for measuring 
the quantifiable factors and circumvent- 
ing those that baffle measurement. 

In examining the field of available 
methods, the methods of analysis of vari- 
ance and covariance for the single case 
are found most satisfactory in setting up 
experimental designs for dealing with 
the above data. Within the universe of 
the single individual, the readings may be 
regarded as more or less independent. 
In this way an analysis of variance can 
be made of each case separately, appor- 
tioning the total variability into its com- 
ponents. These component parts can 
then be studied in groups of contrasted 
individuals (treated and untreated), and 
the significance of the results noted. This 
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procedure may be applied not only to the 
changes in level of performance, but also 
to changes in variability of performance. 
In order to remove the influence of vari- 
ability when considering changes in level, 
analysis of covariance may be resorted 
to. Similarly, the effect of level of per- 
formance may be removed when changes 
in variability are under consideration.* 
The analysis of the individual case has 
been stressed thus far because it seems 
uniquely suited to clinical data. It must 
be remembered that the clinician asks not 
only how does this case differ from the 
other cases that he has seen, but also asks 
in what way does this case resemble his 
previously observed cases. Hence both 
integrative as well as differential methods 
are of use to the clinician. Among the 
tools which serve both a differential as 
well as an integral purpose is the method 
of the discriminant function which serves 
to integrate a group of variables into a 
total score such that the maximum dif- 
ferentiation from a contrasted group is 
effected. A recent improvement of the 


discriminant function is the development 


of the partial discriminant functions. 
These bear the same relationship to zero 
order discriminant function that partial 
correlations bear to zero order correla- 
tions“). Such a partial discriminant func- 
tion would be especially useful when two 
groups are compared postoperatively on a 
test for which they were not equated 
preoperatively. Perhaps the mathemati- 
cal statisticians can provide a tool for 
fractionating a given population into sub- 
groups such that the difference between 
the means of these subgroups would be a 
maximum. This would be the converse 
problem of the discriminant function, and 
may turn the technique into a method for 
discovering types. 

The analysis of individual similarities 
finds its apex in the inverted factor 
analysis or Q technique. Here each sub- 
ject is correlated with all the others and 
the matrix of intercorrelations subjected 
to a factor analysis to determine the types 
inherent in the data“), 

The sequential analysis method pro- 


*For an application of this method see‘7), 
(8) and (11), 
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vides the experimenter with a means of 
determining how many observations are 
required for testing a given hypothesis. 
In the study of intra-individual variabil- 
ity, a suitable derivative of sequential 
analysis ought to prove very useful to 
the clinical statistician. 

In all of these attempts, the underlying 
problem is that of finding a mathematical 
model for the phenomena before us. 
Without such mathematical models we 
can never hope to develop rigorous tests 
of the adequacy of a given conclusion. 
To be sure the mathematical models are 
rigid and do not permit even slight de- 
viations to go unnoticed. But that is the 
very purpose of the mathematical model, 
and as soon as the deviations get to be too 
troublesome, the model can be modified 
accordingly. As an example of such a 
mathematical model, the J-curve hypoth- 
esis of F. Allport might be mentioned. 
The J-curve hypothesis states in essence 
that when social customs and mores af- 
fect a population differentially rather 
than in a uniform manner, the resulting 
distribution of behavior will be J-shaped 
rather than symmetrical. Most of the 
data which have been analyzed under this 
conformity hypothesis are in discrete 
rather than continuous steps. A mathe- 
matical model that suits such a hypoth- 
esis is the binomial distribution which can 
represent discrete as well as continuous 
variables and furthermore produces sym- 
metrical distributions for p= q= .50 
and unsymmetrical and even J-shaped dis- 
tributions when p approaches 1.00 or 
0.00. By fitting a binomial to the con- 
formity data, the value of p itself is 
found to be useful as an index of the 
degree of conformity“). 

In applying this model to various sets 
of data it soon became apparent that some 
conformity data do not lend themselves 
to a binomial fit. This occurred in the 
analysis of the behavior of motorists at 
cross streets. A careful scrutiny of the 
data revealed that they could be divided 
into two parts—data for 3 p.m. when chil- 
dren leave school, and data for the non- 
school hours. Upon the assumption that 
two conformity trends characterized this 
material (greater conformity to traffic 
rules at 3 p.m. and lesser conformity at 
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other times), two binomials were fitted to 
the data which corresponded to the dif- 
ferent conformity levels and the com- 
bined results fitted the total data ade- 
quately. 

SUMMARY 


The purpose of this symposium was 
not to initiate a new type of statistics 
but to adapt as much as possible of the 
current group-centered methods to the 
purposes of clinical research. The needed 
adaptations of group methods to the prob- 
lems met in clinical research was stressed 
by several contributors. The theoretical 
reasons why present-day methods are not 
fully applicable to the understanding of 
an individual case were outlined and the 
ground was prepared for the presentation 
of individual-centered methods. 

One outcome of this symposium was a 
demonstration of the possibility of treat- 
ing each clinical case as a separate uni- 
verse. Such stress had formerly been 
made only by the idiographers“*’. They 
have won their point, but their claims 
that the idiographic method can never be 


handled statistically is not shared by the 


author. In fact, some of the papers in 
this symposium show the way for quanti- 
fying the idiographic approach. 

Another outcome has been the demon- 
stration of the desirability of collabora- 
tion between clinical research workers 
and mathematical statisticians. Many of 
the problems facing the clinical research 
worker have already been adequately 
solved by statisticians. Others promise to 
lend themselves to solution as soon as the 
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basic problem becomes clear. Indeed, it 
becomes quite apparent that as soon as a 
given clinical problem is specified ade- 
quately, a solution soon becomes avail- 
able. 
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INTRODUCTION 


The purpose of this paper is to discuss 
the methods of handling the time dimen- 
sion in the. statistical analyses of material 


obtained from a single individual. Vari- 
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Study of Human Development 


ous methods have been employed to ob- 
tain statistical confirmation of clinical in- 
terpretations or to obtain insight into the 
personality of single individuals by an 
analysis of successive measures on the 
same person. The common theme running 
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through all such methods: is the time di- 
mension and the proper handling of that 
dimension of the data is important for an 
accurate interpretation. 

One statistical method which has been 
recently used by several investigators is 
the analysis of intra-individual correla- 
tions—to use this author’s terminology. 
Cattell has called this the P-technique 
to emphasize its methodological rela- 
tionship to the Q-technique and R-tech- 
nique. The method may be briefly de- 
scribed as follows: An individual is re- 
peatedly measured on a number of dif- 
ferent variables. This series of measure- 
ments is taken as a sample from the hypo- 
thetical population of such measures on 
the individual. In this sample, the cor- 
relation of any two variables may be ob- 
tained. Figure 1 presents data which may 
be used as an example. A child is rated 
on two variables, competitiveness and so- 
cial poise, each day for a sequence of 20 
days. The correlation between the two 
variables is .79. 

This sort of correlation between two 


80 
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temporal series has to be chiefly studied, 
from a statistical point of view, in eco- 
nomic data. Whether the methods which 
have proved useful for economists are 
equally useful for the analysis of longitu- 
dinal data in biology and psychology is 
questionable. Those methods imply tacit 
assumptions which are intuitively reason- 
able for the treatment of economic data 
but which are not so reasonable in the 
analysis of intra-individual data. An ex- 
amination of the data in Figure 1 will 
illustrate the various problems in analyz- 
ing temporal data. 

It is perfectly obvious that neither of the 
samples of twenty measurements shown 
in Figure 1 is a random sample in the 
usual sense. There is a clear-cut temporal 
trend in each sample. If the usual methods 
of time series analysis were adopted, these 
trends would be removed before any cor- 
relation between the two variables were 
computed. The failure to remove the trend 
would be said to lead to a “spurious” cor- 
relation. 

In what sense 


is such a correlation 
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spurious? The context in which such a 
correlation was considered spurious might 
be illustrated by the following example. If 
the production of wheat and the price of 
wheat were studied for each year from 
1840 to 1940 by correlating the two over 
that sample of 100 years, there would cer- 
tainly be a positive correlation between 
the two. One might be led to conclude 
that when the supply went up, the price 
went up. Yet if the temporal trend was 
removed, that is, if the effect of the gen- 
erally increasing population, production 
and price level over the last hundred years 
was taken out, the correlation would prob- 
ably be negative. Years with a large supply 
relative to the trend would be years with 
a low price relative to the trend. All sorts 
of such “spurious” positive correlations 
can be found, between the number of 
teachers in the United States and the na- 
tional consumption of alcoholic beverages, 
for example. It is reasonable in this eco- 
nomic context to think of such correla- 
tions as spurious; the correlations re- 
flected a general factor whose effect was 
misleading and which was not of interest 
to the research worker. Yet in Figure 1, 
the fact that competitiveness increased 
as social poise increased is interesting. 
Truly enough, they may. both be con- 
sequences of the child’s adjustment proc- 
ess, but for an analysis of the adjustment 
process, the fact that they both went up 
rather than going in opposite directions is 
a fact which must be accounted for by an 
acceptable theory of development. Even 
the fact that the correlation might be- 
come negative if “time” were partialled 
out does not make the original correlation 
spurious. 

The argument for the spuriousness of 
the correlation is not merely that both 
variables are correlated with a third; it is 
the fact that this third variable is time. 
Part of the definition of random sample is, 
of course, that the various measures must 
be independent of each other, and the 
existence of temporal trends is taken to 
prove dependence. It would seem on the 
face of it that these measurements on suc- 
cessive days are not independent of each 
other. The value of the second measure is 
dependent upon the fact that it is second. 
If the first measure were higher, the sec- 
ond measure would probably be different 
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from what it is now. Just what is implied, 
however, by this concept of dependence 
must be carefully analyzed. It is the con- 
tention of this paper that there are at 
least two kinds of dependence among the 
successive measures. One of these—which 
we will call situational dependence—does 
not invalidate the intra-individual correla- 
tion coefficient. The second—which we 
will call measure-to-measure dependence 
—does invalidate the correlation in the 
sense that its usual statistical interpreta- 
tion is not valid, i.e., its sampling distri- 
bution is not accurately described by the 
usual formulae. 


SITUATIONAL DEPENDENCE 


Situational dependence is characterized 
by the fact that both correlated variables 
are related to a single third factor or to a 
cluster of auxiliary factors, which are cor- 
related with time. In the data presented 
in Figure 1, for example, there is such a 
third factor, familiarity of the situation. 
It is reasonable to suppose that children’s 
behavior is influenced by the familiarity of 
the situation. Therefore, the twenty ob- 
servations recorded in Figure 1 sample a 
number of situations ranging from strange 
ones to relatively familiar ones. The mere 
fact that the strangest situation occurs first 
and that the familiarity increases with time 
does not invalidate the usual ‘interpreta- 
tion of the correlation between competi- 
tiveness and social poise. In order to un- 
derstand that correlation, the fact that 
familiarity is uncontrolled must be taken 
into account, but the correlation is mathe- 
matically valid if situational dependence is 
the sole form of dependence. 

At the risk of unnecessary repetition, 
this discussion might be carried a_ bit 
farther. The usual definitions of random- 
ness and probability are phrased in such 
a way as to emphasize unduly the tem- 
poral order of the data, t.c., by speaking 
of a random sequence approaching a limit. 
The existence of a lawfulness in the data 
when plotted in temporal order seems to 
be much more disturbing than the exist- 
ence of a lawfulness when the data are 
rank ordered with respect to some other 
dimension of variation. In Figure 2, for 
example, twenty pairs of mathematics and 
English scores are plotted. The order 
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along the abscissa is determined by the 
rank of the individual in I.Q. This situ- 
ation where the data may be ordered with 
respect to some variable in such a way that 
trends are evident, does not lead to the 
insistence that such trends must be par- 
tialled out before correlations are mean- 
ingful. What is the unique position of the 
temporal dimension which makes it so im- 
portant that data can be randomized with 
respect to it? Our answer, of course, is 
that time is not a unique dimension, 

This type of dependence is called situ- 
ational dependence because each temporal 
position represents a unique situation 
which puts certain constraints upon the 
measurements. In Figure 3, some hypo- 
thetical distributions of responses on each 
of five consecutive days are illustrated. In 
strange situations, such as represented by 
the first day in nursery school, the mean 
and sigma of the hypothetical distribu- 
tions of responses might be low, as shown 
in Figure 3. In the circumstances repre- 
sented by the second day, the mean might 


Mathematics and English scores of a sample arranged in order of I.Q. 


be slightly higher and continue to increase 
with each succeeding day. It is important 
to note, however, that according to this 
formulation, the position of the obtained 
measurement in the first day’s distribution 
has no effect on the position of the second 
day’s measurement in its distribution. 
Whether that sort of independence actu- 
ally holds for the nursery school data in 
Figure 1, is perhaps questionable, but 
situational dependence as outlined above 
can certainly exist. Its existence does not 
invalidate the correlation coefficient. 
What we have called situational depend- 
ence is probably not dependence at all in 
a strict sense. The model to which such 
data has been fitted is that of a series of 
populations whose means are correlated 
with time. The sample of successive meas- 
ures is considered a single random meas- 
urement taken from each of the popula- 
tions in turn. The individual members of 
such a sample are independent of each 
other even though there is a temporal 
trend, a high serial correlation, or other 
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such measure of the relationship with 
time. 

Perhaps it is worth noting that it is 
only the independence of successive meas- 
ures which is under discussion. The inter- 
pretation of the correlation shown in Fig- 
ure 1 depends also upon the normality, 
homoscedasticity, etc. These characteris- 
tics of the data are obviously just as im- 
portant in time series correlations as in 
any kind of data. Some of the criticism of 
correlations of time series are based on the 
fact that temporal data is rarely normally 
distributed. With such criticism the author 
has no quarrel provided that correlations 
of other data are criticized with equal 
severity. 


MEASURE-TO- MEASURE DEPENDENCE 


There is another type of dependence 
of one measure on another, which we have 
called measure-to-measure dependence. If 
the individual's behavior on the second 
day is influenced not only by the fact that 
he experienced the situation on the pre- 
vious day but also by the fact that he be- 
haved a certain way on the previous day, 
then there is a measure-to-measure de- 
pendence. Illustrating this second kind of 
dependence is the individual who has to 
be consistent from one time to tne next. 
His second response would be kept con- 
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sistent with the first even though the gen- 
eral surrounding conditions at the time of 
the second response might lead to a dif- 
ferent sort of behavior, i.e., the person 
who gives the same answer twice to a 
question because he remembers what he 
said before rather than because he feels 
the same way he did before. A person 
who feels he has a reputation to uphold 
might behave the same way from one day 
to the next because he was committed by 
his first day’s behavior. Or contrariwise, 
an individual might deliberately change 
his second response if he felt the first one 
was wrong or if he felt he should not get 
into a rut. 

This sort of dependence need not be 
illustrated only in terms of personality 
characteristics. The reason that successive 
draws from a finite pack of cards are de- 
pendent is not so much because there have 
been previous draws or because the pack 
has fewer cards but rather because the 
results of the previous draws have changed 
the probabilities on a later draw. In suc- 
cessive choices in a maze, the second 
choice may be dependent upon the first 
because the characteristics of the second 
choice may depend upon what the first 
choice was. In some sorts of learning situ- 
ations, the results of one trial give the in- 
dividual insight into the problem and thus 
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change his behavior on later trials. In 
other sorts of learning situation, it seems 
as if the first trial merely initiated a 
growth process which leads to improved 
behavior the second time but that it makes 
little difference just what was the precise 
response to the first trial. If the behavior 
on the first trial changes later behavior, 
then there is measure-to-measure depend- 
ence; if it is merely experiencing the situ- 
ation on the first trial which changes be- 
havior then there is situational depend- 
ence. 

Analogies to these two kinds of de- 
pendence can be found in the more usual 
cross-sectional correlation study. If a 
sample of individuals were individually 
tested, the first one starting in the early 
morning and the last one not being tested 
until late at night, there would probably 
be a fatigue factor in the scores. A good 
experimenter would try to eliminate it 
in one fashion or another but its presence 
would not invalidate the correlation be- 
tween two variables in the performance. 
If, however, half of the sample copied the 
answers of the other half of the sample, 
there would be a dependence among the 
scores which would invalidate the correla- 
tion. 

In what sense is the correlation invali- 
dated if there is measure-to-measure de- 
pendence within the sample? Thinking 
first of the specific illustration used above 
where half of the sample copied the papers 
of the other half, it is clear that the num- 
ber of degrees of freedom was cut in half 
by the dependence. Only half of the total 
sample had independent measurements 
and the second half merely counted the 
first half over again. Thus the correlation 
is invalid because the sampling distribu- 
tion is not accurately described by the 
usual reliability formulae. 

In successive measurements on the 
same individual, such a clear-cut situation 
could conceivably arise, but generally the 
degrees of freedom would be decreased by 
an unknown amount. In some cases, the 
number of degrees of freedom might be 
increased. In certain kinds of situations 
therefore an accurate sampling distribu- 
tion might be obtained by a modification 
of the number of degrees of freedom. 
These are the ones in which the effect of 
the dependent relationships is,to break up 


the total sample into smaller groups within 
which the dependence decreases or in- 
creases the variance relative to the whole 
sample. In more complex kinds of inter- 
relationships there is no reason to suppose 
that accurate sampling distributions could 
be obtained in any such manner. 


DiIscuSSION 


The fruitfulness of the usual methods of 
time series analysis in handling intra-indi- 
vidual data is difficult to appraise. Cer- 
tainly those methods are valid in the sense 
that the experimenter is free to partial out 
any portion of the variance which he 
wishes. There is no guarantee that every 
kind of dependence will make itself known 
by the appearance of trends or cycles in 
the data, but if the data, either before or 
after removal of part of the variance, meets 
the various tests of randomness which 
might be applied there is nothing the ex- 
perimenter can do but proceed as if it were 
random. ‘ 

The objection to the usual methods 
when applied to intra-individual serial 
measurements is that they remove parts of 
the variance which the experimenter may 
very much wish to keep. Particularly if 
the object of the experiment is to perform 
a factor analysis, the experimenter will 
not want to remove any more variance 
than is necessary to obtain interpretable 
correlations. He would much prefer to 
let the factor analysis separate the vari- 
ance into various parts rather than take 
out certain arbitrary parts ahead of time. 
Therefore, it is often preferable not to 
partial out any trends or cycles unless it 
is absolutely necessary for the proper in- 
terpretation of the correlation matrix. In 
economic research, the removal of trends 
is apparently consistent with the reason- 
able interpretation of intercorrelations of 
economic factors, but in intra-individual 
data that sort of preliminary treatment of 
the data is not necessary for the psy- 
chological interpretation of the results. 

Therefore, it will be very important for 
progress in the statistical analysis of in- 
tra-individual data to develop mathe- 
matically sound methods of handling time 
series analysis which are as compatible as 
possible with various psychological de- 
siderata. 
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SUMMARY 


In summary, this discussion has been 
an intuitive analysis of the problems in- 
volved in interpreting intra-individual 
correlation coefficients. The methods com- 
monly recommended for the analysis of 
time series are valid but for the analysis 
of psychological data they separate the 
total variance into portions which are 
not maximally useful. Furthermore, they 
seem unnecessarily stringent. There are 


situations in which a temporal trend does 
not seem to invalidate the correlation co- 
efficient. There are other situations in 
which it seems as if an accurate sampling 
distribution might be derived by an ad- 
justment of the degrees of freedom. Fur- 
ther mathematical analysis of the prop- 
erties of time series seems like a profitable 
investment of effort and the techniques 
which such an analysis might produce 
would permit a more meaningful analy- 
sis of longitudinal data. 
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The diagnosis of guilt, or more specifi- 
cally, the detection of deception in crimi- 
nal investigations is analogous but inverse 
in structure to the clinical problem of 
personality diagnosis by means of the so- 
called projective techniques. In the clini- 
cal case the patient suffers from some 
emotional disturbance the nature of which 
is unknown to clinician and patient alike 
because of unconscious repression. This 
capricious but purposive force may at- 
tempt to thwart the probings of the clini- 
cian in many ways, the most embarrassing 
being the distortion of the unconscious 
problem through apparently misleading 
signs or symptoms. It is quite different 
in the criminal investigation. Here the 
guilty culprit is the only unknown factor. 
The nature of the crime is clearly and 
definitely known both to the guilty per- 
son and to the investigator. Whatever 
distortions occur are traceable directly to 
the conscious attempts of the criminal to 
deceive the interrogator. 

As a clinical problem the probing of a 
criminal suspect for purposes of detect- 
ing deception imposes rather severe re- 
strictions of a statistical and experimental 
nature. Life and death, personal liberty, 
character and reputation are at stake. It 
is essential therefore not only to minimize 


but practically to eliminate the error of 
diagnosing an innocent individual as a 
criminal. But from the police point of 
view it is also imperative to prevent too 
many criminals from being released on 
the say-so of a kindly or timid psycholo- 
gist unwilling to assume the responsibility 
of so important a decision. If now, the 
practical restriction of limited time, and 
therefore a limited number of observa- 
tions, is further imposed on the investi- 
gating process one can readily see that no 
statistical test can adequately and effec- 
tively meet the limiting form of these re- 
quirements. 

It is responsibilities and difficulties 
such as these that force the psychologist 
engaged in criminal work to fall back 
upon absolutely rigorous experimental 
controls supplemented by the necessary 
but simple, statistical tools. By present- 
ing what has been done in the probing 
for and diagnosing of criminal guilt we 
hope to indicate, indirectly at least, that 
many of the difficulties inherent in pro- 
jective investigations, result more from 
the neglect of rigorous experimental con- 
trols than from the complexity of the 
clinical problem. Too often have psy- 
chologists manifested a too dependent at- 
titude on a specific test or instrument 
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used in diagnosis despite inadequate sta- 
tistical or experimental validation. A 
challenge of the test would arouse vigor- 
ous reactive defenses through the medium 
of complex verbalizations with such im- 
pressive terms as “interactions,” ‘‘total 
situations” and “intuitions.” Occasion- 
ally a dignified retreat might be observed 
in the promise for a search for more com- 
plex statistical procedures to meet the ad- 
mittedly complex clinical problem. And 
yet, it might have been simpler to change 
the test or to replace it with a new one. 
Preliminary work on techniques for de- 
tecting deception, though very promising, 
was obscured by too great and too fre- 
quent a dependence upon non-objectifiable 
intuitions. In this respect the analogy 
to projective techniques is quite obvious. 
The results, though exceeding chance ex- 
pectation, were not accurate enough for 
the critical decisions demanded in this 


work, nor, in the eyes of the skeptical 
police, were they greater than that ob- 
tained by a skilled police interviewer. 
There was only one scientific answer to 
the problem: experimental control. How- 


ever, the variables amenable to control 
were numerous. A sensitive instrument. 
with a high degree of precision, had to 
be devised and calibrated“ to yield ob- 
jective reactions while the suspect was be- 
ing interrogated. These reactions had to 
be free from control by the suspect since 
it was evident that he would consciously 
try to exercise such control in order to 
deceive the operator. A questionnaire 
technique had to be standardized and vali- 
dated. It had to be practicable, natural, 
reliable and specific, the latter to elimi- 
nate ambiguity of interpretation. Though 
preliminary in nature, this basic ground- 
work was possible only through exacting 
and piecemeal experimentation aided by 
appropriate but relatively simple statisti- 
cal analyses. From this had to emerge a 
criterion efficient in discriminating truth 
from falsehood, and free, as far as pos- 
sible, from subjective interpretation). 
Since, in the final analysis, lie detec- 
tion techniques depend upon a “conflict” 
situation and the attendant ‘‘emotional”’ 
disturbances that accompany such a con- 
flict, the criterion had to be independent 
of the rather pronounced inter- and intra- 
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individual variability in emotional re- 
sponsiveness. Finally, the criterion had to 
be a good discriminator which, in this 
problem, was merely the accuracy with 
which it could make correct identifications 
of guilt or innocence. 

Simple statistical requirements prompt- 
ed the repetition of the elements involved 
in obtaining the criterion. Because of 
psychological and experimental consid- 
erations the elements were varied from 
test to test in order to eliminate complete 
habituation or capricious association and 
to provide a further basis for an evalua- 
tion of consistency in reaction. Thus a 
valid estimate of uncontrolled variation 
was available. 

The nature of the criterion stemmed 
from the structure and content of the 
questionnaire. In essence it involved a 
comparison of the objective instrumental 
reaction to critical questions and corre- 
sponding reactions to emotional standards. 
Critical questions were of the direct Yes 
or No type specific to the various aspects 
of the crime being investigated. Emo- 
tional Standards were highly charged 
emotional issues selected from a study of 
the life history of the suspect—issues 
which tended to produce marked anxiety 
or embarrassment and which the suspect 
would ordinarily refuse to discuss or re- 
sent being questioned about. The statis- 
tical basis for the criterion distinguishing 
truth from falsehood is a ratio of two 
universes, of responses: the responses’ to 
critical questions and the responses to 
emotional standards. For the innocent 
suspect both universes possess similar 
characteristics of embarrassment, resent- 
ment and anxiety. But for the guilty in- 
dividual the responses to critical questions 
(those on which he :s attempting decep- 
tion) possess the added quality of intense 
conflict with a pronounced increase in 
emotional reactivity. In terms of meas- 
urement, the ratio of these two universes 
of responses tends to approach 1 in truth- 
ful situations but increases sharply under 
conditions of deliberate deceit. Extensive 
experimental verification established the 
basic validity of this over-all test under 
diverse conditions’), except for rather 
specific irregularities during initial re- 
sponses. In some innocent individuals 
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and particularly in “anxiety” patients the 
ratio was markedly greater than 1 at the 
outset of the testing period but rapidly 
converged to unity upon continuous rep- 
etition of the procedure. Since no such 
general adaptation effects were noted for 
guilty individuals who attempted decep- 
tion, a new discriminating criterion was 
available to make the diagnosis of truth 
and falsehood still more precise. The 
obvious and simple statistical test for the 
validity of this new criterion was that in- 
volved in testing slopes. 

(me further fact becomes apparent 
from the analysis of this technique and 
its application. Since both universes of 
responses, that is, the ‘‘critical” and “‘emo- 
tional,” necessarily reside within the same 
individual, the criminal suspect is essen- 
tially and always his own control. And 
since the discriminating ratios are ob- 
tained from contiguously paired “emo- 
tional” and “critical” questions, this fea- 
ture of experimental design tends to de- 
crease an additional source of uncon- 
trolled variation. 

Thus far we have emphasized the ne- 
cessity of rigid experimental analysis and 
re-analysis of all the essential features of 
the test process and have indicated that 
the correlative statistical analysis of the 
isolated but clearly defined elements was 
a relatively simple one. We have noted 
that decisions of importance, and espe- 
cially those in which vague answers are 
not tolerated, tend to force the clinician 
into an experimental re-evaluation of his 
tools. In our particular work concerted 
effort to objectify the clinical aspects of 
the examination of a criminal suspect 
finally yielded simple and relatively ob- 
jective criteria. But though the ideal 
had been to eliminate or at least control 
the subjective elements it is apparent 
that the validated criteria could not be 
expected to exhaust all the possibilities of 
diagnosis inherent in even the relatively 
simple and clear cut problem presented 
in criminal investigation. Actual prac- 
tice demonstrates that it becomes well- 
nigh impossible to eliminate all traces of 
subjectivity. In investigating the influ- 
ence of subjective factors in this type of 
criminal investigation the accuracy of 
three types of diagnoses was determined : 


one was based on the measurable and 
purely objective application of the vali- 
dated criteria; another was made by an 
independent expert who analyzed only 
the records; the last was that of the 
operator of the instrument who both ex- 
amined the subject and visually analyzed 
the records. On the basis of the first 
test record alone no significant differences 
in accuracy were noted among these three 
discriminators, though the operator was 
better than the independent expert, who, 
in turn, was more accurate than the ob- 
jective criterion. As _ additional tests 
were administered, the discriminating 
capacity for the three discriminators in- 
creased, with final accuracy of 95-99% 
for the operator, 88-90% for the inde- 
pendent expert and approximately 85% 
for the purely objective criteria. Appar- 
ently the independent expert was utiliz- 
ing non-measured and perhaps non- 
measurable criteria for diagnosing guilt 
or innocence; the operator who spoke 
to and questioned the suspect was influ- 
enced by still other qualitative indicators 
not measured by the objective criterion. 
It seems reasonable to assume that other 
than the evolved measurable criteria 
were used, and to advantage. Undoubt- 
edly this is also what happens in projec- 
tive diagnoses where the expert most 
likely, and perhaps unconsciously, uses 
other than specific test cues in arriving 
at a diagnosis. 

experimental and _ statistical consid- 
erations suggest the need for additional 
criteria and increased observations to 
minimize the differences among these 
three diagnostic situations. We are con- 
tinuing to make objective inroads upon 
subjective intuitions in our search for 
additional valid criteria but find it psy- 
chologically non-feasible to insist on too 
many tests from prospective suspects. 
But despite lack of perfection in diag- 
nosis, the accuracy of the operator in 
distinguishing truth from falsehood 
through instrumental technique is suff- 
ciently high to be a valuable adjunct in 
criminal investigations. 

While continuing the research to purify 
criteria and objectify additional intui- 
tions, we introduced a new category of 
decision in order to delimit the error in 
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diagnosis still further. Records are 
known to possess degrees of clarity and 
interpretability. | Movements, whether 
conscious or unconscious, tend to mar the 
clarity of an individual's electrical re- 
sponses. Furthermore, the magnitude of 
the criterion ratio varies among the in- 
nocent as well as among the guilty and a 
very large or Wvery~ small ratio inspires 
great confidence in diagnosis. Finally 
there are some individuals, who, for rea- 
sons as yet unknown, tend to give records 
that are either ambiguous or of doubtful 
interpretation. For these reasons it be- 
came desirable to introduce a No-Decis- 
ion category which on practical and the- 
eretical grounds minimizes the errors 
of diagnosis. The basis for action in any 
individual case is a function of the mag- 
nitude of the objective criterion and also 
depends upon the subjective evaluation of 
certain as yet’ unmeasured characteristics 
of the record. Since, in the last analysis, 


the decision rendered to the police officers 
is a reflection of the operator’s confidence 
_and experience, the best practical precau- 
tion to minimize the errors of diagnosis 
was to restrict the region of both “guilty” 


and “innocent” diagnoses by creating the 
new region’ of No-Decision when the ratio 
was not within the best discriminating 
range and the subjective evaluation of 
other aspects of the record created a 
doubtful attitude in the mind of the op- 
erator. It seems more reasonable to be 
aware of the limitations of one’s instru- 
mental technique and admit inability to 
arrive at a decision than it is to make an 
erroneous diagnosis which may have a 
serious outcome. 

With this final restriction in operation 
the results in criminal investigations were 
most gratifying. No errors of diagnosis 
have been reported as yet in a criminal 
case load of over 500 examinations. 
Though impressive this result must be 
evaluated with respect to the following 
considerations. The No-Decision cate- 
gory was rather large, 10% of the case 
load ; the guilty group comprised another 
10% ; while the innocent diagnoses made 
up the bulk, or, 80% of the cases. This 
latter figure is not unusual in practical 
criminal work because many more are 
suspect than are guilty. Furthermore, 
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not more than 50% of the innocent de- 
cisions have as yet been absolutely con- 
firmed since verification depends upon the 
ultimate seizure and successful prosecu- 
tion of the guilty individual. However, 
the overall results are most encouraging 
and serve to motivate the tedious experi- 
mental search for additional criteria and 
the continual re-evaluation of the current 
procedures. 

The evolution of this lie-detection tech- 
nique serves to show that a clinical tool 
can attain a practical degree of objec- 
tivity—the ultimate goal of any scientific 
pursuit. Wherever subjective elements 
necessarily influence the final decision 
simple statistical considerations will show 
that an acceptance of the limitations of 
the instrument or technique will diminish 
the errors its indiscriminate use would 
produce. A great deal of concern over 
such issues *) and some suggestions for 
their solution”) have been appearing 
continuously in the literature. 

The often-reported distrust of statis- 
tics and experimental method on the part 
of clinicians should be tempered by the 
knowledge that whenever there is clarity 
of objective, experimental control is al- 
ways welcomed. And where experi- 
mental control becomes efficacious, statis- 
tical procedures need not become compli- 
cated. But the clinician should remember 
that clinical intuitions do not aid the prog- 
ress of mankind unless they are transmis- 
sible and can be used efficiently by others. 
Such goals, however, cannot be reached 
without experimental and statistical con- 
trol. 

However, the present clinical fad 
seems to be a search for an instrument 
which gives a multi-dimensional answer. 
Thus, one projective technique has been 
used as a psychiatric diagnostic tool, a 
psychological test of personality, an. iti- 
telligence test, an aptitude test and a 
brain damage test. Our research has 
been in quite the opposite direction. We 
have devised and validated our technique 
to get but one specific answer in a re- 
stricted area of operation. Perhaps 
clinical psychology might be better served 
by developing precise undimensional tech- 
niques which give clear cut answers to a 
specific issue rather than hope to devise 
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an instrument that gives vague and am- 
biguous answers to all questions. 

Our work in lie-detection demonstrates 
the possibility of maximizing the objecti- 
fication of our clinical intuitions through 
the rigorous application of experimental 
controls which do not necessarily involve 
the application of complex statistical pro- 
cedures. This is possible if the problem 
is systematically attacked and the instru- 
ment or technique not forced to give a 
multi-dimensional clinical decision in 
vague multi-dimensional language. 
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PROBLEMS OF MULTI-JUDGE RELIABILITY 
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A frequently used method for assess- 
ing reliability of judgments or ratings is 
to have several judges rate the same ma- 
terial and then determine how well the 
judges agree. Corresponding to retest 
reliability, split-half reliability and other 
types of reliability this method may be 
termed “multi-judge” reliability. In gen- 
eral, multi-judge reliability involves either 
estimating the average intercorrelation 
among the several judges or intercorre- 
lating the series of judgments obtained 
from each judge with the series obtained 
from every other judge and then com- 
puting the mean intercorrelation. 

At the Institute of Welfare Research 
of the Community Service Society of 
New York this multi-judge technique 
has been used in the studies called “‘meas- 
uring movement in casework.” One as- 
pect of these studies has been the devel- 
opment of a scale for judging, movement 
or change which occurs in a client and 
or his situation between the time the case 
is opened until it is closed. This move- 
ment scale is characterized by having 
anchoring case illustrations and was con- 
structed according to psychophysical 
principles. A history and description of 


the scale can be found in several publica- 
tions“: 2), 

In brie!, each judge-—be it worker on 
the case or another person—applies the 
standardized scale to a structured case 
sunimary and makes a rating which may 
vary from —2, indicating deterioration, 
to +4, signifying marked improvement. 
In the course of the development and ap- 
plication of this scale a persistent statisti- 
cal problem has confronted the investiga- 
tors. This statistical problem may be 
broken down into two major null hy- 
potheses which present themselves for 
consideration when the reliabilities of 
multi-judge ratings are being compared. 
The first of these null hypotheses is: 
Do groups with varying levels of case- 
work experience, but trained similarly in 
application of the scale, exhibit no differ- 
ences in intra-group agreement? The 
second null hypothesis is: Does a single 
group display no difference in inter-judge 
agreement as a result of training in the 
use of the scale? 

As can be readily observed, these two 
questions are merely specific examples of 
a broader problem involving how to evalu- 
ate differences in sets of ratings or scores 
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obtained, in the first instance, from n 
groups of k individuals or, in the second 
instance, from one group of k individuals, 
n times. It is obvious that the aim of any 
agreement-increasing technique, whether 
it deals with the instrument for obtain- 
ing ratings or with training methods, is 
to secure maximum agreement among the 
judges. Accordingly, the critical problem 
is how best to determine whether alterna- 
tive methods designed to increase agree- 
ment among judges really do have such 
an effect. 

As an illustration of one phase of the 
problem of “testing” differences in re- 
liability between groups we have chosen 
one sample of cases from the files of the 
movement study. Table 1 presents two 
matrices of intercorrelations obtained 
from a group of 9 caseworkers chosen at 
random from the staff of the Family 
Service Department. These intercorrela- 
tions were determined from the ratings 
of these 9 workers for 38 case summaries. 
In the sector above the diagonal are the 
correlations between the judges when the 
workers used the manual and ‘instruction 
sheet but had no training or practice in 
the use of the scale. The segtor below 
the diagonal presents the correlations be- 
tween the judgments of the workers after 
a training period of six hours. This 
situation thus involves the reapplication 
of the scale to the same cases by the same 
workers after a period of training and 
discussion with practice cases. Since the 
two sets of ratings are likely to be cor- 
related, the test of the difference between 
the sessions is even more complex than 
would be the case for independent groups. 
Nevertheless, in all of the methods we 
have used for testing the difference, we 


have tended to treat the sets as independ- 
ent under the assumption that conclusions 
thus obtained would probably be more 
conservative. If we had had more con- 
fidence in our methods such an assump- 
tion need not have been made. 

Taste 2. Significance of -difference between 

sessions treating correlations as raw scores 
(X = 145) 
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Method 1: 


raw 


Treating correlations as 
Table 2 presents the t-ratio 
and P-value found when the correlation 
coefficients are treated as raw scores. 
Here it is seen that the mean intercorrela- 
tion before training was .718 and_ the 
mean intercorrelation after training was 
778. The standard deviations for the 
two sets of 36 scores were .057 and .059 
respectively. The t-ratio found was 4.30. 
If we use 70 as the df, the P-value is 
less than .OO1. However, since these 
correlations actually involve only 9 work- 
ers it may be that a better estimate of the 
df is something less than 70. Even with 
df as low as 8, however, which is one less 
than the number of workers, the P-value 
associated with the t-ratio is less than .01. 
This test of significance seems to indicate 
that agreement has been improved by 
training... Nevertheless, the important 
question remains as to whether or not it 


scores. 
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is appropriate to use Student’s ¢ in such 

a situation. Would transformation of 

the r’s to s’s have improved this tech- 

nique ? 

TasLe 3. Significance of difference between 
sessions treating standard deviations of case 

ratings as raw scores (X= S.D.) 
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Method 2: Treating standard deviations 
of case ratings as raw scores. Table 3 
presents the results when the standard 
deviations of ratings for each case are 
treated as raw scores and a t-ratio ob- 
tained. Before training, the mean stand- 
ard deviation was .636 while after train- 
ing it was found to be .540. The t-ratio 
resulting from this comparison was 2.20. 
With df of 74 the P-value is approxi- 
mately .03. Once again there is a ques- 
tion about the most appropriate number 
of df. With df of 8 the P-value would 
be approximately .06. Again the vexing 
problem arises as to whether or not such 


TABLE 4. 


Sum 


of squares 
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standard deviations 
treated as raw scores. 


may logically be 


Method 3: Analysis of Variance. Table 
4 presents the analyses of variance for 
each of the two sessions. All ratings 
were made positive by adding 2 to each 
score. May cross-comparisons be made 
between similar terms in these two analy- 
ses for “testing” the difference in agree- 
ment between the before and after train- 
ing sessions? We have approached this 
problem with little confidence, appreciat- 
ing the fact that variance estimates and 
F-ratios for different groups cannot be 
compared willy-nilly. If judges agreed 
perfectly in each session, however, the 
source of variation “between cases” 
would make up the entire variance ex- 
hibited by the scores. The component of 
variance which is most likely to reflect 
a difference in amount of agreement is 
thus reasoned to be the “within cases” 
variation. Accordingly, we have pro- 
posed that the MS's or variance estimates 
of “within cases” be used as numerator 
and denominator of a cross-session F- 
ratio. If we take the df of these two 
variance estimates as they stand, the P- 
value for the cross-session F is less than 
Ol, On the other hand, if the proper 
number of df were actually less than, say, 
100 for each of the variance estimates, 


Analyses of variance of judges’ ratings 


Est. % of 
square total variance 


Dearees Mean 


of freedom 





Before training 


Between cases 
Within cases 

\ Jucges 

} Remainder 


Sum 


of squares of freedom 


10.55 
50 
1.74 
47 


37 
304 
8 
296 


341 


68% 
32% 

2% 
30% 
100% 
Mean Est. % of 
square total variance 


Degrees 


After training 


hr een cases 
Within cases 

\ Judges 

) Remainder 
Total 


12 
96, 
> 


Cross-session F 
P< .O1 if df 


P>.05 if df < 


354.09 
108.67 


52 


15 


2.76 


~ 


37 
304 
8 
296 
341 





.50/.36 = 1.39 


304 and 304 
100 and 100 





STATISTICAL 


the associated P-value would be greater 
than .05. In this case the P-value is 
markedly dependent on the choice of df. 

The last column in each analysis of va- 
riance table indicates the estimated con- 
tribution to the total variance of each 
component. Here we see that the va- 
riance other than that due to variation in 
case means decreases from 32% before 
training to 25% after training. How- 
ever, we do not know how to test the 
significance of this apparent decrease in 
“within cases” contribution to total va- 
riance. 

Another analysis of variance procedure 
we have attempted in these cases where 
the same set of judges enters into both 
groups is to apply a single overall analysis 
to the results of both sessions. However, 
up to the present we have had no insight 
into how to test the difference in ag.ee- 
ment other than by the method described 
above for the separate analyses of vari- 
ance. 
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These three methods we have briefly 
discussed, almost entirely without invok- 
ing the number of basic assumptions vio- 
lated—such as normality, homogeneity of 
variance, independence, etc. — actually 
constitute a small sample of the various 
procedures for testing significance which 
come to mind for handling multi-judge 
reliabilities. Because we have not been 
satished with the appropriateness of any 
single test we have tended to test the 
same null hypothesis by alternative meth- 
ods in each case, feeling that if all results 
indicate similar trends our conclusions are 
more strongly reinforced. Our need for 
knowledge of more adequate tools for 
handling comparisons of multi-judge re- 
liabilities is confessed. 
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The facts are that both art and science 
enter into the “experienced” Rorschach 
examiner’s interpretation, analysis and 
diagnosis. As in clinical psychiatry, psy- 
choanalysis or clinical practice in general, 
the procedures involved and the avenues 
via which the final desired results are ob- 
tained are not standardized, quantified or 
thoroughly understood. For scientifically 
oriented practitioners there is a constant 
need to reduce the “art” and increase the 
“science” in these areas. 

Not unlike the situation in psychother- 
apy, psychiatric diagnosis and other pro- 
cedures, the diagnosis on the basis of the 
Rorschach depends on sudden or gradual 
insights into a configuration of, for the 
most part, quantifiable factors. These 
factors, however, are not static in rela- 
tion to each other, but exist in a dynamic 


relationship, i.c., of constant mutual modi- 
fication. The extent to which each factor 
modifies the others, or is modified by them 
is not too well known or understood. , 
When we speak of diagnosis we imply 
the identification of some particular syn- 
drome, or cluster of characteristics as 
differentiated from many others. More- 
over, we also imply that the diagnosis of 
some personality condition which the 
Rorschach merely purports to project 
actually is clear-cut, non-intuitive and 
readily quantifiable. Unfortunately the 
situation is not so. The behavioral char- 
acteristics which constitute a “behavior 
diagnosis” suffer from a similar defiance 
of quantifiability as does its symbolic 
shadow or counterpart—the Rorschach 
protocol itself. For the sake of simpli- 
fication and for the greater clarity of this 
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presentation, 
made: 


two assumptions will be 

a. that when we speak of differential 
diagnosis by means of the Ror- 
schach, we shall refer to a separa- 
tion of one type of configuration 
from only one other and not from a 
multitude of possible diagnoses. 
that the corresponding behavioral 
syndrome correlates, are readily 
identifiable and not open to doubt 
or differences of opinion. 


Historically, the attempts at diagnostic 
differentiation on the basis of the Ror- 
schach. technique passed through two 
stages. At first, the distributions of indi- 
vidual factors of one group were compared 
with the corresponding distributions of 
another group or groups. This approach 
characterizes many studies published to 
date. Unfortunately, no single factor 
turned out to be perfectly differentiating. 
There was always some overlapping in 
the group distributions, though the indi- 
vidual diagnoses were correctly made, due 
to the taking into account the “extenuat- 
ing circumstances” of the factor under 
study by the changes in the configuration 
of the remaining factors extracted from 
the Rorschach protocol. 

The second stage is represented by a 
fairly crude cluster analysis which follows 
either inspection or the method described 
above, i.c., a comparison of the distribu- 
tion of single factors. Thus, varying 
clusters of highest differentiating factors 
were put together for the identification of 
such disorders as neurosis (Miale & Har- 
rower), organic involvement ( Piotrow- 
ski) and others. The procedure followed 
was rather impressionistic and the clusters 
included hitherto non-quantifiable factors 
or variables. Though many positive diag- 
noses were missed by this method, the 
probable occurrence of ‘false positives” 
was considerably reduced. In the final 
analysis, however, these clusters of “signs” 
did not stand up under careful and rigor- 
ous scrutiny. Again, they offer “hints,” 
but no clear differentiation. 

More recent application of the cluster 
method at Michael Reese Hospital in work 
done under grant of the USPHS on a 
comparison of schizophrenics and normals 
did not yet yield highly reliable differ- 
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ences. It is possible that the continuous 
study of an infinite number of clusters may 
lead to the crystallization of a few highly 
reliable ones. Thus far it is still in the 
realm of hope than fact. 

In order to follow through with this 
last suggestion other problems of quanti- 
fication require solution. The experienced 
Rorschach examiner does not base his in- 
terpretation on the quantitative summary 
alone, but uses qualitative data as well. 
These data are mainly raw content or ab- 
straction of factors from the content which 
are certainly not normally distributed but 
occur rarely, sometimes only once or twice 
in large populations. He also uses a num- 
ber of behavioral clues. Some quantitative 
approach for this raw content and_ be- 
havioral observations is needed as well as 
the inclusion of the single or startling 
“sign” which is at present not being quan- 
tified, in order to account fully, statis- 
tically, for the process called “diagnosis 
by means of the Rorschach.” The forma- 
tion of unified factors from the content 
which can be treated quantitatively may 
give the cluster method a new lease on life. 

Assuming that all the content and other 
nonstructural factors have been reduced to 
simple quantities, the diagnostic basis of 
the Rorschach is still far from being statis- 
tically established. To be sure, we may 
attain a finite number of quantifiable fac- 
tors, but the number of combinations at 
various levels of strength is astronomical. 
Moreover, each ingredient factor is not 
only modifying and modifiable by other 
factors but also contributes in different 
amounts to the final pattern and diagnosis. 
These different amounts are still un- 
known and have not been subjected to in- 
vestigation. 

Thus, schematically, the situation may 
be summarized as follows, even after quan- 
tification of content takes place : 

1. We have n factors which are not sim- 
ilarly distributed ; some follow a nor- 
mal distribution, some a J-curve and 
some, other forms not yet investi- 
gated. 


Each factor varies in the importance 
of its contribution to the total con- 
figuration: la, 2b, 3c, ete. 


A modifying relationship between 
factors 1, 2,3... .n, exists, but not 
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because of a, b, c, etc. This modifying 
power is due to another set of char- 
acteristics—A, B, C, ete. 
Thus, there is a simultaneous set of triple 
relationships in which the individual prop- 
erties are as yet unknown. If statistical 
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methods can be devised to untie this knot 
and break down the enormous complexity 
of Rorschach patterning, a great advance 
would be made in clinical science in gen- 
eral and in personality analysis in particu- 
lar. 
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INTRODUCTION 


Workers in abnormal psychology, guid- 
ance, and educational testing have re- 
peatedly contended that the most mean- 
ingful way to study a person is to obtain 
many data about him and interpret them 
simultaneously. Statisticians have tried to 
trail the clinician as he weaves through 
such data and have had so little success in 
making his processes explicit that they 
have been inclined to say that it is all done 
with mirrors. The clinician’s talk about 
patterns has sometimes been a rationaliza- 
tion used in self-defense when statistical 
controls exploded a_ particular clinical 
claim. But disregarding this frothy and 
irresponsible use of the pattern concept, 
the clinician still has a significant conten- 
tion. Conventional treatments of test 
scores, one score at a time, are inadequate 
for many of the judgments the clinician 
must make. 

One cannot deny that a pattern of data 
is more than the sum of the elements. The 
mind easily recognizes patterns that defy 
operational specification. A navigator can 
recognize Orion in an instant; no doubt 
some electronic device could do this, but 
you can imagine its fearful complexity. 
The best of electronic technicians were 
building sound-detection gear for anti- 
submarine operations, but their devices 
always ran far behind the human ear in 
ability to discriminate sound patterns. 
Perhaps the best answer to those who 
claim that so-called patterns may always 
be reduced to linear combinations of scores 
is the judging of beauty. A girl’s face can 
be measured. You can get a vast number 
of reliable measures to compare two girls. 


But imagine the problem of writing a for 
mula to tell which is prettiest. When a 
man makes that judgment “intuitively” he 
interprets the pattern of features as well 
as the separate dimensions—as the boy 
said, “It’s not what she’s got, it’s the way 
it’s put together.” 

A pattern is a set of scores and their in- 
terrelations. In describing a personality, 
we use a tremendous number of dimen- 
sions‘®), | shall discuss only the interpre- 
tation of patterns from a single multi- 
score test, since that is a sufficiently vex- 
ing problem. By this limitation we avoid 
one difficulty found in clinical analysis, 
that one person can be rated on some 
traits and a second person is rated on other 
traits, but no person is rated on all the 
dimensions. 

Clinical tests differ from traditional 
measuring devices. The Wechsler test 
yields eleven scores. The Rorschach gives 
twenty or so. In the conventional psy- 
chometric instrument, an hour may be de- 
voted to obtaining a single score for a sub- 
ject. The clinician, recognizing that a lim- 
ited time is available, has tried to see how 
many measurements he can make in an 
hour. The difference is somewhat like that 
between an oculist’s examination and an 
Army physical. Both deal with vision, but 
the Army sacrifices exactness in order to 
cover numerous variables unobserved by 
the oculist. A psychologist using some of 
the newer clinical tests will report, for an 
hour of testing, not only ten to twenty 
standardized scores, but a similar number 
of informal observations about traits not 
represented in formal scores. 

The result of the clinical test can be ex- 








22 LEE J. CRONBACH 


pressed as a point in k-space, where k is 
the total number of traits. Considering 
only the 11 Wechsler scores, we need an 
11-space to report the data. When a per- 
son has been located in 11-space, we have 
recorded all the data about him, patterns 
and all, that the Wechsler scores permit. 
When we have scores for N cases, we wish 
to describe how they are distributed in k- 
space and to compare different distribu- 
tions. Virtually all our statistical ques- 
tions reduce to this: “Cases’ from sub- 
group A may be concentrated more heavily 
in certain regions of the k-space than are 
people in general. How can we identify 
those regions and test differences in con- 
centration for significance?” 


Locic oF CLINICAL STATISTICS 


The attempt to treat clinical tests statis- 
tically has sometimes been mistakenly at- 
tacked as a violation of their nature. Thus 
Frank? appears to argue that cases can- 
not be treated objectively, quantitatively, 
and anonymously, when projective tests 
are used. Projective test data can be legi- 
timately treated and judged by psycho- 
metric criteria—that is, in terms of score 
deviations from group norms, shift in score 
from trial to trial, and comparison of cri- 
terion scores for people with similar test 
performance. For the projective data to 
he judged fully and fairly, however, it 1s 
necessary (a) that all the data elicited be 
considered, not just a few scores, and (bh) 
that the data be considered simultaneously, 
not one score at a time. This is permitted 
by conceptualizing the data for a person 
as a point in a vast enough k-space. The 
question, then, is not whether psychomet- 
ric logic is appropriate—it is both appro- 
priate and indispensable. The problem is 
that psychometric methods are not ade- 
quate to these new demands, and may, in 
truth, never become fully adequate. 

Before analyzing statistical methods, we 
may note some features of multi-score 
tests. Each subscore, being obtained in a 
short time, is usually unreliable. Scores are 
unequally reliable, which is unfortunate 
in view of our extensive use of profiles. 
The shape of a person’s profile is distinctly 
different from his probable profile of true 
scores, if the separate scores are unequally 
reliable. Statisticians could help here by 


devising methods for estimating the true 
profile of a person, given some ten un- 
equally-fallible correlated scores. Esti- 
mated true profiles should be more in- 
formative than profiles of raw scores and 
important in examining differences be- 
tween scores. One other difficulty is that 
test scores are not equally correlated, as 
some statistical methods demand. If tests 
could be made more reliable, equally re- 
liable, and equally intercorrelated, clinical 
testing would more quickly produce signif- 
icant findings. As it is, we are constantly 
weaving complex tapestries of statistics 
with threads of data that are much too 
gossamer for the project. Maybe we need 
to spend 25 hours testing each subject 
before beginning to use refined statistics. 

Very reliable single-score data are not 
necessarily superior to fallible multi- 
scores. Suppose for the moment that per- 
sonality is assumed to constitute a fifty- 
dimensional universe. The person’s true 
position is a certain point in this 50-space. 
The psychometrist uses an hour to give 
the Binet scale and reports an IO of 115. 
Suppose this IO to be precisely right; 
what has it done? It has located our sub- 
ject in a particular 49-space, but has left 
his location in the remaining dimensions 
completely unspecified. So that if we guess 
where our man falls and could measure 
the error by the distance from his true po- 
sition, we might be extremely wrong in 
locating him. The clinician gives a clinical 
test and arrives at somewhat inaccurate 
estimates on twenty traits, in the same 
hour of testing. So he locates the subject 
in 50-space, twenty of his statements be- 
ing approximations and thirty of them 
being guesses. Which procedure is most 
advisable is unanswerable, though I esti- 
mate that the IO permitted one to locate 
the person in 50-space with an efficiency 
one per cent better than chance, and the 
twenty scores (if their reliability is .50) 
permit one to make an estimate eleven per 
cent better than chance. What we have is 
the question whether you can find a man 
better by knowing that his house is on the 
parallel 38°40’ or by knowing that he lives 
somewhere in Kansas. Either way, the 
data are not as good as we need, and that 
is the way it is in personality. 
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SINGLE-ScOoRE METHODS 


The simplest statistical approach to han- 
dling test data is to treat one score at a 
time, as in the t-test or analysis of vari- 
ance. This amounts to projecting the k- 
space distribution onto one line at a time. 
Such a method fails to consider interrela- 
tions of scores. Perhaps low Comprehen- 
sion is indicative of schizophrenia when 
accompanied by high Block Design, but 
not when the subject’s Block Design score 
is normal or low. Treating one score at 
a time therefore overlooks interaction dif- 
ferences. 

This method also encounters trouble- 
some problems of parsimony. When a 
great many significance tests are made, 
numerous differences which would con- 
ventionally be called significant arise by 
chance. This is not always recognized in 
clinical research ; I counted, in one study, 
800 significance tests, yet the investigator 
solemnly interpreted each difference that 
reached the five per cent level. If scores 
are uncorrelated, we can estimate the num- 
ber of low P’s arising by chance. But in 
many studies, the scores are correlated. 


We need some way of evaluating signifi- 
cance tests when the same samples are 
compared on many separate correlated 
variables“). 


TREATING SCORES SIMULTANEOUSLY 


If we try to attack the matrix of scores 
all at once, we encounter other problems. 
The clinician thinks of people individually. 
For statistical methods, we must group 
the points in k-space, identifying cases 
having the same pattern or closely similar 
patterns. When we think of grouping our 
cases in categories, we learn that this is 
impractical. Suppose we divide each of 
the Wechsler dimensions into only two 
categories, high and low. Then there are 
2". possible combinations of the eleven 
scores to be considered. That means an 
N of 2000 just to get one representative 
of each unique pattern. And no person 
working with the Wechsler test would ac- 
cept the high-low dichotomy as an ade- 
quate representation of the subtest per- 
formance. If we divide each scale more 
finely, the number of patterns mounts 
astronomically. 

Because of the number of possible pat- 


terns, evidently practical research on pat- 
terns can be done only in two ways. One 
is to study just those subdivisions of the 
k-space where many cases fall. Perhaps 
Q-technique is suitable for this. The other 
method is to reduce the number of dimen- 
sions. The popular method of considering 
parallel profiles as equivalent) reduces 
the k-space only to (k-1)-space and is 
therefore of little help. 

We can reduce k-space to 1l-space by 
dichotomizing the distribution. This is 
the much-used signs approach. The inves- 
tigator sets up a hypothesis (before look- 
ing at the data if the significance test is to 
be trusted) that cases of type A are con- 
centrated in some region. The surface 
bounding the region may be complicated, 
being defined in terms of a great number 
of scores. Ordinarily chi-square is used 
to test the hypothesis about concentration. 
The signs approach is limited in usefulness 
because the investigator must decide what 
hypothesis to try. Perhaps he tests 
whether high Vocabulary—high Digit 
Span—low Arithmetic characterizes 
schizophrenics. It is impossible for a per- 
son to inspect the Nk facts before him 
and decide where concentrations occur. He 
may notice some relations, but he is likely 
to miss others. Statistics would be most 
helpful if they could systematize the in- 
spectional stage, where signs are defined. 

The psychologist wishes to consider not 
only the simplest types of differences, as 
between two Gaussian multivariate dis- 
tributions with corresponding axes paral- 
lel. In the present stage of our concepts, 
we wish to leave open the possibility of 
discovering curvilinear distributions, mul- 
ti-modal concentrations, and multi-valued 
regressions. 

The multiple regression and discrimi- 
nant function methods do process the data 
mechanically. We arrive at an equation“ 
like that relating Rorschach scores to pilot 
success : 

X = 2(Dd + S%) + 6FM + 

8W + R — 15D% — R,_,,% 
The equation reduces a six-dimensional 
array to one dimension by projecting 
every case on the line normal to the hyper- 
plane defined by the equation. By this 
particular equation, we treat two cases as 
identical which are alike except that one 
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has high FM and the other has high W. 
This is completely contrary to the theory 
behind the test and makes no particular 
sense. All that the equation—which, by 
the way, turned out to be invalid—can do 
is to provide the most efficient formula of 
this type. It expresses an empirical rela- 
tion and may well be the best first-degree, 
one-dimensional approximation that can 
be obtained. But it is obvious that when 
we plunge from six dimensions to one we 
have discarded a tremendous amount of 
information. Only if all the points in 
seven-space (six space plus the criterion) 
are co-planar can we project our data 
without losing information. If the vari- 
ation from a planar distribution is only a 
matter of chance errors, this is unimpor- 
tant. But psychologists would like to ex- 
plore the possibility that the distribution 
in six-space is anything but planar. 

The discriminant function fares no 
better. It, too, gives a simple equation 
which combines many variables. This 
equation defines a surface which cuts our 
space into two parts ; hence, it reduces our 
data to one dimension. It is true that the 
discriminant function can be of an order 
higher than one, just as the linear regres- 
sion equation can be replaced by a curve. 
The limitation of higher-order discrimi- 
nant functions is in part practical ; compu- 
tational difficulties multiply very rapidly 
and huge samples are required to deter- 
mine several parameters. But the other 
question is whether the relationships we 
are investigating are simple enough for 
even fairly involved functions to do them 
justice. If, for example, there are several 
personality patterns that make good 
teachers, it will take a very complex func- 
tion to cut these people out of the herd. 
The problem can be simplified only if the 
investigator can inspect his data so bril- 
liantly that he recognizes the types of good 
personality and separates them from each 
other. 

For our most common problems, we 
need a statistical method based on the con- 
centration principle. Here is how it would 
work. If we wish to compare schizo- 
phrenics and normals on trait T, we could 
divide the T continuum into intervals. We 
would compute what proportion of the 
cases in each interval are schizophrenic 
and plot the results thus: 
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The smooth curve is the hypothesis sug- 
gested by our data, assuming that we have 
a large sample. This shows where schizo- 
phrenics appear to be concentrated. If we 
can have a significance test for a procedure 
like this and can generalize it from one 
trait to k traits, we will have what we 
need for discrimination problems and for 
predicting linear criteria. 

We will probably begin to encounter 
multiple criteria soon, where performance 
is described in a criterion space of several 
dimensions. The perplexing problem of 
relating the distribution of test scores in 
k-space to the criterion scores in k’-space, 
without assuming linearity of relations, 
lies ahead. 

Incidentally, reliability and change with 
time in two or more dimensions present 
knotty problems. We may develop useful 
methods for dealing with patterns of two 
or three scores. But when we try to com- 
pare the distribution of scores in two- 
space with a second distribution for the 
same cases, we have a four-dimensional 
array. We could consider the error of 
measurement as a distance between the 
true score and observed score for each 
case. We could get the ratio of the squared 
error of measurement to the dispersion of 
all cases about the centroid. But this ap- 
proach ignores the fact that the error is 
a vector quantity, as is the dispersion. The 
ellipse representing the distribution of ob- 
tained scores for a single person will have 
a different shape from the ellipse for the 
distribution of persons. 


PATTERN TABULATION 


One useful ad hoe procedure for some 
of our problems is pattern tabulation®?. It 
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is severely limited since it copes with only 
two or three scores at a time, but even that 
is an advance. One begins with any three 
scores, preferably reliable ones. In the 
Wechsler test they may be derived from 
sets of subtests. In the Rorschach, I have 
dealt with the better-founded scores such 
as W, D, and Dd, or M, sum C, and F. 
Each score is normalized. The general 
factor underlying the three scores is re- 
moved by computing deviation scores, 
each deviation score being the difference 
between a person’s score and his mean 
score in the three variables. Differences in 
the general factor are treated separately. 
The essential assumption is that profiles of 
normalized scores having the same shape 
are psychologically equivalent. The shape 
of the profile is defined, in two dimensions, 
by the deviation scores. These are plotted 
in homogeneous coordinates, giving a 
plane diagram. The distribution of pat- 
terns in the plane is informative; for the 
first time, we can get our three-dimen- 
sional data laid out where we can see them. 
If there are concentrations of cases of a 
given type in some region of the plane, 
chi-square permits us to test for signifi- 
cance. Here, also, we must avoid spurious 
significance claims based on hypotheses 
tailored to a single sample. Treatment of 
the data by correlation methods or by 
analysis of dispersion appears unsatisfac- 
tory, and analysis of dispersion runs into 
the vector problem. For our data, where 
we know rather little about the homo- 
geneity of subgroups or the linearity of 
relationships, chi-square seems much 
more satisfactory. 


A MATCHING PROCEDURE 


If we desire to escape entirely from the 
sorts of difficulty we have been discussing, 
we can permit the clinician to operate on 
his data intelligently, and then validate his 
synthesis of the test results. We come 
very close to the personality as a whole 
by Vernon's matching method. The psy- 
chologist writes qualitative descriptions of 
cases using as many traits as required. 
Then criterion data are obtained, and 
these may be complex and qualitative. 
Judges are required to match descriptions 
with criteria, working with sets of per- 
haps five cases at a time. Unfortunately, 


a description that is half false may still be 
matched if it fits the criterion better than 
some other description. 

I have recently expanded the Vernon 
method so that we can examine an analy- 
sis such as that from the Rorschach and 
determine its accuracy piece by piece. The 
description of each case is broken into 
statements. Criteria are obtained for the 
subject of the description and for two 
other persons. Numerous judges compare 
the descriptive statement with a criterion, 
each judge being given just one of the 
three criteria. After each judge decides 
whether the statement does or does not fit 
the criterion he holds, we can say whether 
the statement was significantly valid, in- 
valid, overgeneral, or ambiguous. The full 
elaboration of the method provides for 
randomization of statements among 
judges, tests of significance, etc. The full 
procedure has been described at length 
elsewhere“. 

SUMMARY 


It is impossible to summarize so ram- 
bling a paper as this. A few points are 
worth emphasis. Given k scores, all the 
data can be represented in k-space. We 
must ordinarily reduce the data to fewer 
dimensions by considering certain pat- 
terns to be equivalent. When this is done 
by straightforward computation, as in 
the multiple-regression method, patterns 
are often set equal to each other which 
common sense says are unequal. If we can 
provide a psychological definition of equiv- 
alent patterns, mathematical methods can 
no doubt be applied. Just now we are in 
the stage of seeking methods which are 
complex enough to be adequate and simple 
enough to be manageable. 
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INTRODUCTION 


It is helpful to distinguish, at the out- 
set, between the psychological matters in- 
volved in type psychology, and their for- 
mal, logical, or psychometric representa- 
tion. The present paper is concerned pri- 
marily with the latter. In earlier pa- 
pers"5 ') T suggested that a sufficient 
basis for representing type psychologies 
is to be found in Q-technique, that is, in 
terms of correlations between persons, 
and in the present paper I propose to put 
the matter in a statistical setting, with 
the object of bringing the technique to 
bear upon some of the current preoccupa- 
tions of psychologists with theories of per- 
sonality. The central idea is a simple 
one: it consists of defining universes of 
traits or similar observable characteris- 
tics which can be sampled. The sample 
is then used to describe, as a statistical 
distribution, certain aspects of personal- 
ity; finally these descriptions about per- 
sons are correlated and examined in 
terms of suitable factor theorems, for 
example, those referring to common fac- 
tors, specificities, simple structure error 
and person variabilities, and the like. 


The correlational theorems are employed 
mainly in a deductive framework, being 
little different, in this respect, from a t- 
test or other convenient tests of statisti- 
cal significance), 

Types, in Q-technique, are represented 
by common factors, often by correlated 


factors. But these types are very dif- 
ferent from those described in the psy- 
chological text-books. It has been usual 
in the past to think of types as sections or 
cuts along a univariate scale: thus, idiots, 
morons, feebleminded, normal, and su- 


perior types of persons (from the stand- 
point of intelligence) are so defined, as 
sections marked off along a scale of intel- 
ligence. Text-books also place the ex- 
treme extravert indefinitely at one end 
of a linear scale, and the extreme intro- 
vert at the other, and, if the function is 
normally distributed, most of us are 
neither of one type nor the other. Typifi- 
cation of this kind, however, has little or 
no psychometric rationale, and is wholly 
without formulation at a theoretical level. 
It is scarcely to be wondered at, there- 
fore, that there is currently among psy- 
chologists a wide-spread disbelief (as 
they put it) in types of any kind, and a 
distinct opposition to any form of typol- 
ogy or system of type psychology. Whilst 
it is not my object to defend any of these 
systems as such, it should be clear, as we 
proceed, that the above text-book notion 
of types does less than justice to the vast 
inductive system of, for example, Jung’s 
psychological types. Nor, indeed, does it 
represent at all what is involved in typolo- 
gies in general. 

The types described by Jung“, for 
example, could have been, or in fact had 
been, real persons, flesh and blood crea- 
tures, and not mere points on a linear 
scale. It was no doubt always implied 
that -+-3 in standard terms on a scale of 
introversion-extraversion subsumed or ac- 
counted for many details of personality, 
of the kind that give verisimilitude to it. 
After all, or so it seemed, Jung’s system, 
built as it was upon inductive inference, 
and rising from the particulars of clini- 
cal experience to the highly abstract con- 
ceptions of introversion and extraversion, 
seemed to call for crucial tests of intro- 





STUDY OF TRAIT-UNIVERSES 27 


version-extraversion as the most essen- 
tial matters at issue. And all the many 
attempts that have been made to measure 
these I-E attitudes, in such scales as the 
Bernreuter or the Neyman-Kulstadt, or 
in the studies of the Guilfords“®, 
Drake‘®’, Gray and Wheelwright, Ab- 
ernethy"’), and many others, as well as 
the related work on perseveration, espe- 
cially of the Spearman school, all clearly 
had some such notion in mind, that one 
could hope to isolate and to measure a 
unitary function in terms of individual 
differences. The goal was one such func- 
tion if possible, embracing large universes 
of individual differences. And little, in 
point of fact, has been achieved along this 
line. It is not unjust to say, I think, that 
most of this work was misconceived, for 
it threw away all the deductive possibili- 
ties of a system such as Jung’s, and in no 
way really represented what he had in 
mind. Much the same can be said of 
other typological systems handled in the 
past by psychometrists in terms of sepa- 
rated functions and individual differences. 

It is probable, indeed, that if the men- 
tal tester’s concern in the past had been 
with sheep, or goats, or any other animals, 
instead of with the personalities of hu- 
man beings, a distinction would have 
been observed from the outset between 
species or types based upon the orderliness 
of their parts relative to the animal as a 
whole, and the mere overall sizes of any 
of the separate components of the whole 
as such. A big sheep, for example, or a 
little one, an old one or a young one, is 
still of the type mouton, and a large or a 
small goat likewise of the species bouwc. 
lf I were to represent animals in general 
by a universe of linear measurements 
taken, say, about 1000 lines fixed by the 
morphology of vertebrates, the  rela- 
tive sizes of these skeletal parts within 
each animal would enable me to distin- 
guish sheep from goats, whatever their 
age or condition; and any “mixed” spe- 
cies would be as distinct and as unequi- 
vocal as any pure type itself. Only a 
butcher is primarily interested in the in- 
dividual differences in sizes of sheep. 
The zoologist would take more stock of 
such differences if one type were always 
exactly twice the overall size of another, 


within error limits. Similarly for the 
study of personality types, our interest 
could be primarily in the specification and 
depicting of, say, the schizophrenic type 
in terms of the relative significance of a 
host of parts or component characteristics 
within the personality itself, and not at 
all in a search for a large. schizophrene, 
so to speak; nor need we look for a per- 
son extreme or heavy with introversion. 
Relativity of parts is involved, and not a 
butcherlike preoccupation with overall 
sizes as such, It is this matter of internal 
relationships that Q-technique represents 
in a systematic manner. It does so by de- 
fining universes of observable character- 
istics (such as traits of a high degree of 
particularity), the significance of whose 
parts, relative to one another, makes it 
possible to describe personality, or as- 
pects of it, much as the relative measure- 
ments of the 1000 anatomical parts of an 
animal could allow us to describe the 
shape of a sheep, or of a goat, and at the 
same time provide us with a means for 
distinguishing between them without equi- 
vocation, It is suggested, then, that sys- 
tematic typology should be conceived as 
the study of statistical universes of par- 
ticulars in terms of which detailed de- 
scriptions can be given of whole aspects 
of individual personalities. 


TRAIT-UNIVERSES 


These universes of particulars, then, 


have to be defined. We have to begin 
somewhere, and all that is required, at 
bottom, is that there should be a large 
number of observable characteristics, of 
a high degree of particularity. They may 
be traits, whether defined as “immediate 
behavioral acts” or as “recurrent patterns 
of behavior’: but highly generalized 
terms, such as “neurotic,” or “intelli- 
gent,” are usually unsuitable. All the 
possible responses to the Rorschach test 
might constitute a trait-universe (as I 
shall call such lists); or they may be 
items of performance, as in the example 
I gave some years ago of the statistical 
description of a person’s reactions in a 
performance (intelligence) test  situa- 
tion’®. | have defined as a trait universe, 
similarly, some 2000 statements made by 
Jung in his writings about personality 
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types: and it may be recalled that where- 
as I can find as many of this, Guilford 
could scarcely exceed 35  introversion- 
extraversion traits for purposes of his 
factor studies—clearly he was looking for 
highly generalized statements. My own 
list, on the contrary, contains almost 
everything given by Jung, from very par- 
ticular statements such as “courteous, but 
with a certain uneasiness,” “has awkward 
experiences with his friends,” “sometimes 
put down as an immoral person,” to 
phrases which demand special knowledge 
for their understanding, such as “a prey 
to anxiety, lest his phantasy becomes 
real,’ “when confronted with a strongly 
emotional situation, is momentarily lame, 
and becomes resistive,” “the object has a 
sensuous hold on him,” and the like. 

It is interesting that one could hardly 
assume, without distorting matters, that 
any of these 2000 traits is normally dis- 
tributed with respect to individual differ- 
ences. On the contrary it is more reason- 
able to suppose that their frequency dis- 
tributions would be skewed. From the 


outset, therefore, they are debarred from 


correlational study by way of R-tech- 
nique), that is, with respect to individual 
differences for a sample of persons taken 
at random from a population of persons, 
for which one has to assume normal dis- 
tributions. With increasing generality, 
however, the individual differences may 
no doubt take on the normal shape, and 
the traits used by the Guilfords(?® no 
doubt passed muster in this respect. 

But to return to trait-universes. A 
steady working through Beck’s  well- 
known books on the Rorschach test * 
provides us with several hundreds of 
statements particular to the test, such as 
“form rigid (F),” “stereotyped,” “over- 
emphasizes the abstract-theoretical ap- 
proach,” and so forth; I have compiled 
500 such items, and these, too, I regard 
as a trait-universe for typological study. 
Sut, as I indicated in a recent paper®?, 
when a case is written about in language 
that we expect a client to understand, 
statements are used which are appropriate 
to the need. Prospective nurses are writ- 
ten about in language that a Sister or a 
Matron can appreciate, such as “‘sweet- 
natured,” whereas terms such as “hard- 


headed,” “business-like,” and the like, are 
more applicable in the commercial world. 
In the paper just referred to-I made use 
of a trait-universe of 800 such everyday 
terms, compiled from my case reports on 
about 100 men and women who had been 
tested by Rorschach and other means, and 
whose personalities had been described in 
language suitable to the needs of a Lon- 
don business firm which was interested in 
selecting high-grade trainees. As I have 
just suggested, a different trait-universe 
would be needed for describing prospec- 
tive nurses, and a different one again for 
officer cadets. Each of these trait-uni- 
verses really represents a psychological 
or behavioral field in which business 
men, nurses, or cadets, respectively, have 
to fit and function. Moreover, the same 
person may assume very different per- 
sonalities in these different fields—as hap- 
pens to the hen-pecked husband who is a 
veritable dragon on the golf course and 
a wretched termagant at work. All such 
changes can be represented with some 
nicety by Q-technique procedures. In the 
same way the mellifluous phrases of a 
Shakespeare may be listed, all, that is, 
that concern the description of personal- 
ity, and used to represent personalities in 
the theatrical milieu, or to describe, even 
in factor terms, the changes in the behav- 
ior of a Lady Macbeth or an Othello as 
the plays proceed. 

Ideally, then, trait-universes consist of 
innumerable “units of behavior,” or ‘‘ob- 
servable characteristics” (Keynes). These 
may be regarded as given in the sense that 
populations of children of a given age 
range are accepted as the basis of work 
on individual differences. The 2000 
Jungian traits may be so accepted. Hav- 
ing defined any such universe, samples 
may be drawn from it at random, and it 
is these that we use to provide the descrip- 
tions of individual personalities, even if 
there should happen to be only one per- 
son in the world upon whom we want to 
experiment. 

Trait-universes can also be systematic- 
ally composed, to suit the needs of vari- 
ance analysis and factorial design (Fish- 
erian) as well as correlational analysis. 
But this is a matter for separate consider- 
ation elsewhere. It is sufficient to sug- 
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gest, here, that where there is a theoreti- 
cal framework, such as Jung’s system 
provides, traits can be uniquely defined to 
cover all possible combinations of the 
functions and supposed processes at is- 
sue; the method is precisely that used 
originally by Crutchfield) to devise the 
243 different rat experiments for jour 
“treatments” and three ‘grades’ or 
“levels.” For the Jungian scheme, in the 
sane manner, it is possible to compose 
1024 traits, each uniquely determined, for 
four “treatments” and four “grades” ; or, 
if two traits of each kind are fixed upon, 
a total of 2048 is available for experi- 
mental work. Variance analysis, how- 
ever, does not permit us to represent types 
in any systematic way ; but it is clear that 
trait-universes can be defined, following 
the Crutchfield pattern, to which either 
correlational or variance methods may be 
applied. Similarly important, as a sys- 
tematic matter, is the fact that  trait- 
universes can be composed in such a way 
that mean scores are always zero for each 
person. This is the case for the I-E uni- 
verse for example. 


THEORIES OF PERSONALITY 


Trait-universes, however, are only the 
outer shell of personality, and the study 
of personality does not, in fact, consist of 
the investigation of these traits per se. 
Account has to be taken of the different 
psychological or behavioral fields in which 
the personalities function: different 
trait-universes, I have suggested, can 
take care of such different conditions. 
But, similarly, ever on the search for the 
causa finalis, there are psychologists who 
wish to reach into personality, into inner 
structures and cores of steady motivation, 
which cause behavior to be what it is. 
The essence of personality, according to 
this view, is found in “perpetual mo- 
tives’), and not in traits, nor in behav- 
ior per se. From this standpoint, too, the 
apparent arbitrariness of traits is ex- 
plained away: thus, the most reliable, 
punctilious, and trustworthy secretary I 
ever knew had been a delinquent in other 
circumstances. But she was obsessional, 
with a severe super-ego structure at the 
core of her personality, and this (together 
with opportunity) explained her delin- 


quencies in the first place, as it does her 
rigid punctiliousness as a secretary. She 
was indeed the most moral of persons, for 
even her stealing had been a self-punish- 
ment, as was clear from certain attendant 
rituals (of prayer before stealing, for ex- 
ample, and scarification of her wrists and 
breasts and other frightening compul- 
sions): the over-severity, and the oppor- 
tunity to steal, explained the delinquency 
—these were the “conditions of behav- 
ior,’ and not any traits as such. 

A different picture is presented by All- 
port’s concept of functional autonomy”. 
For, if we continue the example just 
given, it may be supposed that the sec- 
retary is now married, and that ambi- 
tion, efficiency, and proud jealousy of her 
new-found status, characterize her: she 
is now thoroughly motivated as a young 
mother. It may be supposed that age, 
success, and the increasing Ego develop- 
ments attending these, overlay all else, so 
that the earlier harshnesses are no longer 
apparent. She now functions autonom- 
ously, in terms of this newly-won motiva- 
tion. This seems quite possible: vet a dis- 


cerning eye could no doubt spot the tell- 


tale inner structure, the same severe 
Super-ego. [ven so, it would be a mis- 
take to underestimate the strength and 
reality of her present motivation as the 
condition of her behavior. 

Roughly, then, I-assume that personal- 
ity implies matters of the above order. 
One’s behavior may be different, but the 
inner motives the same; motives may be- 
come autonomous, certainly approximate- 
ly so; different field conditions mediate, 
too, and perhaps crucially so on occasion. 

A few basic motives, however, do not 
form a trait-universe. But the character- 
istics or observations upon which these 
motives are based, can do so. It is thus 
possible to list the universe of behavior 
out of which the psycho-analyst draws his 
notions about the Id, Super-ego and Ego; 
and it is this that we would use to de- 
scribe personality in this context, and 
from which we would expect types to be 
made apparent, for example the anal, 
oral, or genital types of early psycho- 
analytical literature. The Rorschach 
test, too, is meant to reach into the inner 
core of personality, But we can repre- 
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sent personality in terms of such “deeper” 
traits, for example in terms of Beck’s 
Rorschachian terminology ; and the same 
personalities can be described in terms of 
many different trait-universes, represent- 
ing different field conditions—such as I 
suggested for business, nursing, and army 
milieu. We contemplate, therefore, dif- 
ferent layers, so to speak, of trait-uni- 
verses. Here, in a nutshell, is the sta- 
tistical framework in terms of which we 
propose to study personality, with layers 
of trait-universes near to the inner core 
of perpetual motives, and outer layers 
surrounding these, and dependent upon 
the immediate field conditions. We can 
hope to determine, for such trait-uni- 
verses, whether motives are indeed auto- 
nomous, or whether they are rooted in 
the more or less permanent structures of 
causa finalis. 
STATISTICAL DESCRIPTIONS 
Having drawn a sample from a trait- 
universe, it is an empirical matter to de- 
scribe any particular personality, or any 
aspect or character of it, in terms of the 
sample. We might begin by asking which 
of the traits in the sample are most char- 
acteristic of the given individual, either 
in a particular behavioral situation, or by 
assuming randomisation of such situa- 
tions, so that a certain generality is im- 
plied. Thus, if we wished to describe 
the immediate personality of Lady Mac- 
beth before the murder, high significance 
would be given to such traits as “top-full 
of direst cruelty,” “undaunted mettle,” 
and the like, and low marks would be 
given to “infirm of purpose” or to “O 
gentle lady.” Significance in this case is 
thus a subjective construct, representing 
an assessor’s judgment or intuition, in 
terms of which the traits of a sample are 
quantified. Each trait gains a mark for 
its significance in the personality in the 
given circumstances; but, as I have ex- 
plained elsewhere“®), the basis of quan- 
tification may rest upon almost any of the 
constructs which are widely current in 
present-day personality or clinical study, 
such as, for example, “idealization,” “ra- 
tionalization,” “projection,” and the like. 
Or, the significance may be with respect 
to more objective indications, such as a 


straightforward count of the occurrences 
of different items of behavior, as in my 
account of the statistical description of 
performance®, 

For convenience (although I shall sug- 
gest that matters of necessity are in- 
volved) it is usual to assume that values 
for significance are normally distributed 
for large samples of traits, for each in- 
dividual. Moreover, since our concern is 
with types, and not with individual differ- 
ences for separate traits, we make the 
mean values for significance the same 
for all individuals, or for the same per- 
son who may be variously assessed with 
respect to one sample of traits.* I assume 
a normal distribution for significance in 
most studies for which Q-technique is 
suitable, and in practice this “works” 
very well. But in all cases the shape of 
the distribution could be a matter for em- 
pirical determination. If in fact it is un- 
reasonable to regard the differences be- 
tween the traits as normally distributed, 
then either the sample of traits, or the 
original trait-universe, is likely to be in- 
adequate; or else cognizance has to be 
taken of the non-normal distributions, 
and other more appropriate statistics 
used. But where the distributions are 
acceptably normal, 1 follow the practice 
of requiring all assessments to conform 
to one frequency distribution, the stand- 
ard deviation of which is fixed by the size 
of the sample of traits, so that the means 
and standard deviations are the same, re- 
spectively, in all the personalities under 
examination. This is largely a matter of 
convenience. 

It is one of the crucial assumptions of 
correlational theory that any two varia- 
bles or functions that we wish to corre- 
late should be arranged in an order that is 
significant”), that is, such that we have 
some a priori reason for expecting some 
connection to exist between the orders of 
the two functions, to paraphrase Keynes. 
It is in this sense that the term signifi- 
cance is used above; and we use this as 
the basis for our correlational studies. 


If there are reasons for supposing some 
connection between one variable and an- 
other, the existence of correlation will be 


* This is usually a property of the trait-uni- 
verse in the first place. 
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taken as affording at least a little induc- 
tive support for these reasons or supposi- 
tions. Thus, having quantified Lady 
Macbeth’s personality before the murder 
in terms of a large sample of Shake- 
spearian traits, and also Othello’s for the 
same sample, we may correlate the two, 
to find any support that there may be for 
a hypothesis that: similar psychological 
types are involved in the two characters, 
or else to provide evidence for denying 
the suggestion. 

I hope it is clear, then, that there is no 
question in these studies of following 
factor psychologists into the vast realms 
of interspace in the grand manner of Pro- 
fessor Thurstone or of Sir Cyril Burt). 
I merely wish to provide a little inductive 
support, every now and then, for certain 
psychological hypotheses, and, to this end, 
I have reserved the right to distinguish 
between universes which I sample (and 
which provide me with error estimates), 
and variables (such as personalities) that 
I wish to manipulate deductively, and to 
which sampling conditions in no way ap- 
ply. Similarly, in the study of individual 
differences one defines the populations 
of persons—otherwise all is at the mercy 
of differences in heterogeneity of sam- 
ples—and thereafter mental tests are used 
to represent psychological hypotheses, and 
to which sampling procedures do not ap- 
ply. I have already provided several ex- 
amples of the use of Q-technique“® in 
which deductions are represented in cor- 
relational terms, which are thereupon 
studied in the above manner. 

It should be clear, then, that the con- 
cern is with statistical distributions, each 
a personality description in the form of 
a frequency distribution. F'can read down 
these, much as I read through a literary 
description of perhaps the same personal- 
ity: the one has just about as much mean- 
ing as the other. But, having assessed 
any one person (or he may, of course, 
assess himself) for a given sample of 
traits, he may be correlated with other 
persons who have been similarly assessed, 
for the same sample of traits. It is the 
correlation between such assessments, 
usually for several “independent” vari- 
ables at a time, that constitutes the de- 
scriptive statistics of Q-technique. 


Juna’s Type PsycHoLocy 

li is not difficult to express most of 
what I have said already in mathematical 
or statistical terms; many well-known 
correlational theorems are involved, too, 
but the formal handling of these matters 
is outside the scope of the present paper. 
It may be more useful to indicate, how- 
ever briefly, how Q-technique is applied 
to the systematic study of a system such 
as Jung’s. 

As is well known, Jung’s central con- 
cept is one of individuation, the impulse 
of a person to distinguish himself as a 
“single, separate, person” from the col- 
lective en masse. The introvert and extra- 
vert attitudes are the most general forms 
of this process; these are conceived as 
outcrops or resultants from the cultural 
or social-psychological fields in which the 
person is reacting. It follows, therefore, 
that in one field a person may be intro- 
verted, and in another extraverted, al- 
though most habitually he may tend to 
cling to one type or the other in almost 
all fields. The individual should be of 
neither type in any habitual way, how- 


ever, if he is truly and adequately in- 


dividuated (to coin the word). Only in 
this way can one understand Jung’s refer- 
ences to the bifurcation into Introvert and 
Extravert types as in some sense a fail- 
ure of man to adjust himself properly : 
Schiller is quoted as having said that “‘It 
was culture itself that dealt this wound 
to modern man,” 1t.e., making the split 
into the two habitual types. But man’s 
abilities or psychological functions also 
mediate in the individuation, and Jung 
adds, on this account, his well-known 
quarternary, the Thinking, Feeling, Intui- 
ition, and Sensation functions. Similarly 
he has to find a place for conscious and 
unconscious reactions, and notes, in this 
connection, the tendency for a conscious 
function to be attended by a contrasting 
“inferior” function in unconscious reac- 
tions: the dominantly Thinking Extra- 
vert, for example, gives himself away un- 
consciously by many Feeling type re- 
sponses. 

Broadly, then, two main types are dis- 
tinguished, but no one is possessed of the 
one attitude with complete atrophy of the 
other. The two types are “. . of such 
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a superficial and inclusive nature that it 
permits of no more than a rather general 
discrimination. A more exact investiga- 
tion . . . . yields great differences be- 
tween individuals who none the less be- 
long to the same group’. Difficulties 
in placing a person in his proper type are 
referred to by Jung, such as that due to 
the process of “compensation,” whereby 
an Origen (basically a Feeling Extravert ) 
castrates himself, and assumes the Think- 
ing type instead. Corollaries of some 
interest are also given: “phantasy,” for 
example, is held to bridge the broken gap 
between the claims of introversion and 
extraversion. It is said that the extra- 
vert has a certain “repugnance, fear, or 
silent scorn” for introversion, as the in- 
trovert has no less for extraversion ; and, 
whilst the two main types can be distin- 
guished with ease, according to Jung a 
sound discrimination of the functional 
additions requires, instead, a “very wide 
experience.” There are at least thirty of 
these inferences or speculations, dotted 
through the pages of Jung’s works, and 
there were still others, at the time the 
book was written, concerning the relation 
of I-E to the primary and secondary fune- 
tions of Otto Gross, and to perseveration 
(Spearman). 

Experimentation in this region would 
consist, first, of specifying the conditions 
under which these various conceptions 
hold true, if at all; one would hope, in 
more subtle studies, to penetrate into the 
unconscious, as well as the conscious, 
matters at issue. Initial studies would be 
at the “superficial and inclusive” level. 
Some of the latter could well follow the 
lines given below. 

I had available for a series of prelimi- 
nary studies the 2000 trait-universe of 
Jungian terms already mentioned, and in 
terms of samples drawn from this list, 
usually at random, the following prob- 
lems, amongst others, seemed to be worth 
attention, and illustrate the techniques un- 
der consideration. 


\. Under what conditions are the fol- 
lowing distinguishable: 

(i) the two main types ? 

(11) the functional additions to 
these types ? 

(iu) the “inferior” functions ? 


STEPHENSON 


B. What evidence is there for the 

greater or lesser differentiation of 
personalities ? 
What evidence may be given for 
the “repugnance, fear, or silent 
scorn” of extraverts for introver- 
sion, and of introverts for extra- 
version? 

D. Is perseveration related to I-E? 


There are several Q-technique ap- 
proaches to each of these problems, but 
the following will afford some idea of the 
methods required for their elucidation. 


REPRESENTING I-E Types 


A sample of 121 traits was drawn at 
random from the Jungian trait-universe. 
Each member of a graduate class was 
asked to describe in terms of this sample 
(a) his conception of an ideal introvert 
(1), (b) of an ideal extravert (FE), and 
(c) his idea of himself (S). The order 
of assessment was randomised for the 
different persons, and the same frequency 
distribution was employed for all assess- 
ments by all persons. {t has eleven class 
intervals (O to 10), providing a mean 
score of 5.0 for significance, and a stand- 
ard deviation of 2.13. Fifty graduates 
made each of these three assessments, 
and as a class exercise each also corre- 
lated his own three arrays, 7.¢., Ts, Tes, 
and ry. None of the graduates had stud- 
ied Jungian typology, but we assumed that 
all had the usual stereotyped notion of 
what is meant by introversion and extra- 
version, for these terms are now part of 
common parlance. Thus if rs is posi- 
tive, the person so correlating is likely to 
be an introvert; and if res is positive, an 
extravert instead. The correlation ry, 
was expected to be high, and negative, 
and this in fact was the result for all per- 
sons. On these lines a preliminary sort- 
ing of the graduates was made possible ; 
it appeared, incidentally, that more men 
regard themselves as extravert, and more 
women, introvert, as Jung supposed. 

This typification can be validated in 
many ways, for example by independent 
assessments made by the students about 
each other. For the present purpose, 
however, our interest is in the correlations 
between these students. Calculating all 
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the correlations for 50 persons would be 
a formidable task, since 11,175 coeffi- 
cients are involved. But nothing like as 
many ever need attention. Instead, four 
sets of persons may be selected from the 
fifty, five who regard themselves as extra- 
verts, and five who consider that they 
are introverts; similarly ten women, five 
of each type, may be chosen. Tables are 
then calculated for these small _sets, or 
for combinations of them. Table 1, for 
example, is for five women who consid- 
ered themselves to be introverts, to judge 
by their correlations with their own ideal 
types. Similar tables are available for 
any of the sets. 

An analysis indicates that only one ,(a 
bipolar) factor is required to account for 
the ideal representations for these five 
women in Table 1. The saturations are 
in row (i), and these account for all the 
correlations between the ideals, with no 
significant residuals. Only one factor, 
likewise, is necessary for the S-assess- 
ments among themselves: their satura- 
tions are in row (ii). When both of these 
factors are removed Trom the table in 
turn, only four significant residuals re- 
main, for the correlations between four 
of the women, for their S-assessment and 
their description of the ideal introvert. 


TABLE 2. 


That is, they appear to mirror something 
of themselves in the ideal of their own 
type, which in this case was introvert. 
The same result appears for each of the 
other sets, except that where the student 
is extravert, the specific residual is with 
his or her ideal extravert type. We find, 
then, that not only is it simple to repre- 
sent types in the above manner, and that 
significant types can be demonstrated, but 
that the residuals, if any, are apparently 
related to projective processes. 

Again, by placing the men and women 
of a type in one correlation table it is 
easy to test whether a sex difference must 
be assumed. As Table 2 indicates, the 
same inclusive type appears, but the men, 
and the women, give special slants to it, 
stereotypies perhaps of their own culture. 
In short, sex differences are at once 
proven. 

Self-assessments of the above kind, and 
ideal representations too, can be made by 
all sorts and conditions of men, women, 
and children down to ten years of age. 
In no case need more than a dozen suit- 
able persons be correlated at a time for 
any particular study, from which to prove 
or disprove previous assertions. The as- 
sessments of parents and their children, 
for example, can stand alone in one cor- 


Product-moment correlation coefficients for five men and five women intro- 


verts, for 121 traits drawn randomly from a Jungian trait-universe, to illustrate the 


inclusive nature of the type-factor, and a sex difference 


Self-assessments: women 
( D 


459 


603 


282 
320 


594 


506 R80 652 600 


Self-assessments: men 


H I 





329 208 
365 256 
A82 375 
421 293 
405 201 
528 343 
536 272 


350 


616 409 





(The type-saturations (i) are in terms of the women’s type, i.¢., for persons A, B, 


Co = 


as reference values. 


It is obvious that the saturations for the men, on this 
basis, will not account completely for their own intercorrelations : 
tors are thus necessary to represent all the correlations in the table. 


more than two fac- 
sut the women 


alone can be accounted for by a common factor; and the men alone, likewise; these 


two common factors can explain all the correlations in the table. 
tween these two factors can be explained by the sex difference. 


The difference be- 
The table as a whole, 


on the other hand, supports the view that all alike, the men and the women, are intro- 


vertive. ) 
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relation table; they show at once that 
I-E is no respector of persons within a 
family, for some members of the family 
can be extravert, and others introvert. 

Factorizing in this way, however, is 
just the first step in analysis of this kind 
of data. Best-weighted estimates can be 
made of any of the factors. That is, in 
the above case each of the 121 traits can 
be given a score for its significance, ob- 
tained by weighting the scores, trait for 
trait, of the five persons concerned, so 
as to provide the best possible estimate of 
the type."%) The study of these arrays, 
representing hypothetical types, and the 
examination of specificities (see below), 
is normally the stepping-off ground for 
further experiments, for the definition of 
key-qualities, and much else besides. 

From the reliability coefficient (rrr) 
of a person’s self-assessment (takén from 
a repeat appraisal with the same sample 
of traits), and the factor-saturation of his 
self-assessment with his type, part of his 
standard variance can be assigned to 
Specificity, in terms of the following ex- 
pression : 


Tas = Tra — lar 


the 
The more reliable the 
and the lower the T- 


(a is the person, S the specificity, T 
type-factor) 4), 
self-assessment, 
saturation, the greater is the person’s 


specificity. From the best-weighted ar- 
ray for the main type, in comparison 
with a person’s original array, it is a 
simple matter to isolate the traits to which 
this specificity must be attributed, and 
the study of these, as much as of the 
type arrays themselves, is of great in- 
terest. There is more than a suggestion, 
for example, that these specificities “give 
away’ the unconscious or “inferior” 
functions referred to by Jung, and they 
also involve the functional types them- 
selves. 

3efore leaving this example it is worth 
noting that the specificities tend to be at 
least as large as the type-saturations them- 
selves, taking care, in this manner, of the 
fact that although persons may be of a 
type, they nevertheless may differ con- 
siderably from each other. It is easy to 
show, too, that the above facts occur for 


any large enough sample drawn from the 
trait-universe. 


THe JUNGIAN FUNCTIONAL TYPES 


These, it should be noted, are merely 
additional to the main types, and not 
separate from them in any sense“), I 
find that the device of unfolding a per- 
son’s specificity, just referred to, gives 
the readiest indications of his functional 
tendencies. That is, he tends to indicate 
his functional type in terms of a relatively 
small number of traits only, specific to 
himself, rather than in terms of a well- 
rounded common factor. But if large 
samples are drawn from the Jungian 
trait-universe (say 400 traits), more than 
one factor is usually required to account 
for the data for persons of one main 
type; in other words, two clusters of cor- 
relations, at least, become apparent (much 
as is illustrated for Table 2, for sex dif- 
ference in that case), one usually indica- 
tive of the Thinking type, and the other 
the Feeling. Rather more subtle matters 
are involved for the Sensation and the In- 
tuition functions. It should be remem- 
bered that non-fractional factors are used 
to represent all such types“), i.e., there is 
no question of searching for orthogonal 
factors, although, for other purposes, 
studies in terms of simple structure may 
be necessary. 


DIFFERENTIATION OF A PERSONALITY 


It may be argued that the less one be- 
longs to a common (stereotyped) type, 
the more differentiated or individual one 
is likely to be, other things being equal. 
Many counter-influences, however, have 
to be controlled before the facts begin to 
speak for themselves. Thus, a person of 
“mixed” I-E type (1.¢., who correlates 
with both types, as some persons do), 
may suffer markedly from ‘“‘compensa- 
tion’’—wishing to be what he isn’t, and so 
assuming a false air of individuation. In- 
secure persons; for example, are apt to 
do this. It is only by attending to the 
specificities, again, that much direct evi- 
dence of the differentiation of a personal- 
ity can be gained. A simple test of the 
matter is to compare two families from 
widely different social strata, to show that 
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the parents of the more sophisticated 
family differentiate themselves relative 
both to their own children, and to the less 
fortunate family as a whole. Or, again, 
a comparison may be made with results 
for “spontaneity” and “maturation” as 
indicated by the Rorschach test: ten per- 
sons are chosen, all one main I-E type, 
who make self-assessments on a Jungian 
sample, and who are given the Rorschach 
test as well. They may then be assessed 
in terms of a Rorschachian trait-universe, 
as suggested in an earlier paper“), and 
also in terms of a parallel trait-universe 
of everyday terms. One expects to find 
that persons who are /east stereotyped in 
terms of the Jungian traits, have traits in- 
dicative of high spontaneity and rich and 
harmonious development on the Ror- 
schach test or its trait-universes. It would 
take us too far afield to provide evidence 
of the kind required, but the approach is 
quite clear, and preliminary results are in 
accord with expectancy. 


Tue REPUGNANCE OF AN EXTRAVERT 
FOR INTROVERSION 


I am indebted to two students in the 
Department of Psychology at Chicago for 
a neat representation of this kind, and of 
the presumed “silent scorn” of an intro- 
vert for extraversion. The facts turn 
out to be other than was anticipated, but 
they are suggestive none the less, and in- 
dicative of a useful approach to the prob- 
lem. The well-known Szondi photographs 
were treated as though they were a sam- 
ple from a universe of such photographs, 
and were assessed by five introverts and 
five extraverts with respect to signifi- 
cance for which portrait was liked most. 
The ten subjects also assessed themselves 
on a 121 sample of traits from the Jungian 
universe. Two tables of correlations 
were thus made available, one for the I-E 
traits, and the other for the Szondi cards, 
each for the ten persons as variables. The 
I-I© types appear, as usual, as two com- 
mon factors in the one table; and corre- 
sponding types emerge from the Szondi 
table. When the hypothetical types for 
the latter are calculated, some notion of 
what the introverts and extraverts think 
about these cards is formed, at once, by 
taking apart the five cards of least sig- 


nificance in the respective type (i.¢., the 
most disliked). In the case of the intro- 
verts these five cards turn out to be those 
which portray apathy and shut-in quali- 
ties, most strikingly and to a psychotic 
degree; whereas for the extraverts the 
five cards depict obvious mania, over- 
excitement, and exaggerated emotional- 
ity. In short, the introverts and the ex- 
traverts seem to fear, not their opposite 
attitudes, but the reflections of their own 
extreme types, the extreme conditions of 
their own kind. But a different assort- 
ment of cards, in which nothing so ex- 
treme is depicted as the psychotic condi- 
tions of the Szondi cards, could undoubt- 
edly put Jung’s other observation to a test 
of this kind. 

What may be described, reasonably, as 
unconscious influences of the above kind 
appear to be readily evinced in these va- 
rious Q-technique studies; I have already 
reported, for example, that extraverts 
tend to see something in another extra- 
vert who is well known to them, that in- 
troverts under the same conditions fail 
to notice or be aware of 8). Under suit- 
able conditions the “clouding” effect to 
which Jung called attention, whereby a 
person of one type is supposed to have 
little insight into the personality of per- 
sons of the opposite type, can also have 
attention in correlational studies of the 
above kind. 


RELATION BTWEEN PERSEVERATION 
AND I-E 


I add this study, very briefly, in order 
to remind the reader that there is nothing 
unique about the Jungian types ; common 
factors appear for all sorts of universes 


besides the Jungian. In the present ex- 
ample, for which I am indebted again to 
students at Chicago who participated in 
class exercises on correlational theory, a 
450 trait-universe was defined of terms 
indicating perseveration, or non-persev- 
eration, or of primary and secondary 
functions: the students merely culled the 
extensive literature on these topics for 
these traits. Five introverts, and four 
extraverts, who had already made self- 
assessments on the 121 Jungian traits, 
also represented their personalities on a 
sample of 121 traits chosen at random 
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TABLE 3. 


Table of correlations for five introverts and four extraverts for a sample of 


121 perseveration traits drawn at random from a larger tratt-universe of such .traits. 








Introverts 
3 


| 
| 
| 


Extraverts 
2 3 





499 409 
433 


wile Vik wlio 


627 681 .656 645 


347 .362 
382 329 
336 .263 
385 353 
.282 .238 
499 © 496 

458 


655 460 542 480 





(The type-saturations (i) are in terms of the introverts as reference values. 


It is 


obvious that those for the four extraverts do not account for the correlations between 
these extraverts, so that two common factors are required to account for the table. One 
common factor is centered on the five introverts, and the other on the four extraverts, 
and these can explain all the correlations in the table.) 


from the perseveration universe. The re- 
sulting correlations are shown in Table 3, 
for the nine persons concerned, for the 
perseveration sample. But a comparable 
table had already been calculated for the 
I-E traits, and clearly homologous types 
appear for the two samples, centered on 
the same persons—the introverts are of 


one type in terms of the perseveration 


traits, and the extraverts of another. We 
infer at once that the I-E and the persev- 
eration traits are in some degree compara- 
ble, as Jung originally thought they might 
be. But now a direct comparison can be 
made of the respective traits in their or- 
ders of significance for the types involved, 
to see at once what really goes with what. 


CONCLUSION 


It is suggested, then, that Q-technique 
reaches pertinently into hovel study, 
at least in a descriptive manner, if only 
because it centers about a few persons 
rather than a universe of them, and about 
a universe of particulars, rather than 
about a few highly generalised traits. An 
example has been given elsewhere“®? of 
the application of the technique to the 
study of functional autonomy in terms of 
different layers of trait-universes, and 
several other illustrations of its usefulness 
have been at least indicated in the above 
pages. The assessment of samples of 
traits, drawn randomly from. trait-uni- 
verses, has been conducted above in terms 
of two constructs only, one concerning 


ideal types, and the other the “character- 
istic” quality of traits in terms of which 
an individual describes his own or other 
person's personalities. But many other 
constructs of this kind can be used direct- 
ly or indirectly, embracing in this way 
such matters as projection, rationalisa- 
tion and the like. The examples have all 
been drawn from subjective sources, but 
these provide significant correlations and 
so assume a'certain objectivity, and in any 
case the technique is by no means re- 
stricted to these subjective appraisals. 

Types, each represented by a common 
factor, and which are accepted at face 
value, however much they correlate with 
other types, are merely convenient sources 
of empirical classifications, and no as- 
sumptions are involved that relate to uni- 
tary functions such as are sought for in 
studies of individual differences. It 
should be clear that the technique brings 
individual differences to light, but only 
with regard to types, that is, for certain 
whole aspects of personality. These, like 
sheep, exist in their own right. 
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Numerous instruments have been made 
available to the clinician and guidance 
counselor to aid him in determining the 
achievement, aptitudes, attitudes and in- 
terests of his clients. To interpret the 
results obtained with these instruments, 
scales are needed. The latest major con- 
tribution to scaling in the area of achieve- 
ment tests was Flanagan’s“) development 
of Scaled Scores for the Cooperative 
Tests in 1937. The more recent and 
growing interest in scaling procedures has 
been focused largely on attitude scaling. 
McNemar’s‘?) paper on opinion-attitude 
methodology reviews critically the exten- 
sive literature on this problem and dis- 
cusses the method of Thurstone, Likert, 
Remmers and others as well as the more 
recently developed Scale Analysis of 
Guttman. 

In general, the makers of attitude scales 
have used what might be called the di- 
rect approach in which the scale maker 
has constructed his instrument in such a 
way that the scores yield a scale directly, 
without conversion. McNemar argues 
that such scales (called ordinal scales by 


Syracuse University 


Stevens‘) furnish only information con- 
cerning relative rank order among indi- 


viduals to whom they are applied. 

Although ordinal scales are adequate 
when we wish to know whether one per- 
son is superior to another or simply 
whether improvement — regardless of 
amount—has taken place, they are inade- 
quate in two important respects. In the 
first place they do not permit the deter- 
mination of the amount of growth of an 
individual in a particular trait. Nor do 
they, in the second place, permit the com- 
parison of differences in performance of 
individuals in a particular trait. To pro- 
vide these two kinds of information, inter- 
val scales (that is, scales having equal 
units throughout the range of the scale) 
are needed. 

Since even the earliest aptitude and 
achievement scales were developed with a 
view of the solution of these two prob- 
lems, their history will provide the history 
of the development of interval scales in 
education and psychology. Hence in this 
paper I propose briefly to review this his- 
tory, to comment on the applicability as 
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interval scales of a few of the different 
systems in common use with achievement 
tests, and to describe the procedure fol- 
lowed in the development of a scale in 
which the usual assumption of normality 
of distribution is unnecessary. 

When we consider the relative meager- 
ness of the information furnished by the 
raw score on a valid reliable test, the de- 
sirability of converting the raw score 
scale into a more useful scale is at once 
apparent. Since raw scores are obtained 
by summing the number of correct re- 
sponses, the assumption that they consti- 
tute an interval scale is unwarranted un- 
less we have an “ideal” test in which: 


1. The difficulty of all items has been 
precisely determined and no item more 
difficult than the first item failed 
would be answered correctly and no 
item less. difficult is incorrectly an- 
swered, and 
Each item represents a _ level of 
achievement just one unit higher than 
the preceding one. 


Of course, these criteria are rarely met 


in practical situations. Hence the raw 
score on a reliable test provides only an 
ordinal scale on which the various indi- 
viduals are ranked in the ability tested. 
Thus the descriptive information given 
by a raw score is rather meager since the 
raw score is a function only of the num- 
ber and difficulty of the items. 

To make the results of tests more use- 
ful, numerous types of scores and norms 
have been provided. Some of these sys- 
tems of scores have been devised merely 
to simplify presentation and interpreta- 
tion; other systems, on the other hand, 
have been devised for the purpose of pro- 
viding measuring units that are equal 
throughout the scale. However, the users 
of the various systems of scores often 
have not been careful to distinguish be- 
tween these two different functions. 

Age, grade, percentile rank and stand- 
ard scores have all been developed for the 
purpose of simplifying interpretation. 
Although age and grade scores are more 
comprehensible to the layman than many 
other types, they should not be considered 
to provide interval scales, since growth 
in any specific function is generally not 


considered constant throughout — the 
grades nor from year to year. A system 
of scores based on percentile rank, al- 
though useful because of its universal ap- 
plicability and simplicity, is not appro- 
priate for use as an interval scale since 
such use is equivalent to assuming the 
trait has a rectangular distribution. In 
most traits psychologists and_ biologists 
are unwilling to accept the assumption 
that the trait is rectangularly distributed. 
Since standard scores are obtained from 
the raw scores by subtracting a constant 
and dividing that difference by another 
constant, it can be seen that units thus 
obtained are directly proportional to the 
raw scores and hence do not furnish 
interval scales unless the raw scores them- 
selves form an interval scale. 

In science (and this is especially ap- 
parent in the physical sciences), two gen- 
eral methods have been used to establish 
equality of units: (1) by direct compari- 
son and (2) by definition, i.e., certain as- 
sumptions are made and the units are ob- 
tained by operations based upon the as- 
sumptions. Mental traits obviously can- 
not be superimposed and compared di- 
rectly; so, like time, units of mental 
measurement must be made equal by 
definition. 

Various definitions of units of mental 
ability have been utilized. Several sys- 
tents have been based on various growth 
curves. The most familiar of these units 
are the isochron scores of Courtis which 
are defined in terms of the Gompertz 
curve. However, one of the earliest and 
certainly the most commonly used at- 
tempt to obtain an interval scale was the 
utilization of the relationship between 
the shape of the frequency distribution 
and the units used to obtain it. If the 
units of a trait are uniform throughout 
the entire scale, conclusions may be drawn 
about the shape of the distribution upon 
measuring a large group of subjects. 
Conversely, if one is willing to make as- 
sumptions about the shape of a distribu- 
tion, he can then define his units in terms 
of the distribution of the trait. 

Following the teachings of Galton and 
Quetelet, the assumption underlying the 
search for quantitative units has been that 
mental ability is normally distributed and 
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Illustration of tne procedure followed in fitting a pair of 
overlapping skewed distributions to a common abscissa. The 
answers are represented by the equation 
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in terms of standard deviation units and < ,represents 
the skewness. 
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that equal segments of the base of a nor- 
mal curve mark off equal units of mental 
ability. The T-scores of McCall provide 
one illustration of this method.  Al- 
though standard scores with an arbitrarily 
assigned mean of 50 and standard devia- 
tion of 10 are popularly called T-scores, 
it should be noted that a T-score as de- 
fined by McCall is a normalized score. 
One of the problems associated with this 
type of score is the selection of the par- 
ticular distribution for which the scores 
are to be normalized. Clearly, if a single 
distribution contains several sub-groups 
with different means and standard devia- 
tions, the shape of this distribution will 
change upon the removal or addition of 
such sub-groups. A second difficulty with 
units based upon a single distribution is 
that the units at the extremities of the 
scale will tend to be unreliably scaled. 

Flanagan overcame some of the weak- 
nesses present in the use of a single nor- 
mal distribution by utilizing a series of 
overlapping normal distributions. His 
scores are more accurately scaled at the 
extremes and it was possible to use sev- 
eral groups in their own form thus avoid- 
ing the problem of the effect of combin- 
ing unlike distributions. 

Although the assumption of normality 
has been of inestimable value in the his- 
tory of measurement many situations 
arise in which the assumption of normal- 
ity seems unwarranted. Since it often 
seems more reasonable to expect a skewed 
distribution rather than a normal distri- 
bution, a method recently developed by 
the speaker will be outlined. In_ this 
method an ordinal scale is transformed 
into an interval scale by the use of the 
more general Pearson Type III Curve 
represented by the equation 

? 
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in which t represents a score as a devia- 
tion from the mean in terms of standard 
deviation units and a, the skewness. 
Since the process to be outlined involves 
the determination of the appropriate 
skewness for each of the overlapping fre- 
quency distributions and since the normal 
curve is a Type III curve with zero skew- 


ness, the validity of the traditional as- 
sumption of normality can be examined. 

Since the technique has already been 
used to establish units of measurement 
called K-units, which were based on over- 
lapping grade frequency distributions, we 
will use grade distribution in the trait 
word meaning, as illustrative material©>), 
The fundamental procedure for obtaining 
K-units consisted of fitting a series of 
overlapping frequency curves (Pearson 
Type III Curves) on the same abscissa in 
such a manner that the proportions of in- 
dividuals in consecutive grades who ob- 
tain the same score in a particular trait 
correspond to the proportions given by 
the sample data. Equal units were then 
defined as equal distances along the com- 
mon abscissa. : 

It can be seen that comparable prepor- 
tions for each distribution at four selected 
score points common to the two distribu- 
tions will be sufficient to determine the 
three constants. For example (see Charts 
A & B) let us select four score points 32, 
42, 51 and 60 which are common to the 
Grade 4 and 5 frequency distributions 
and below which there are respectively in 
Grade 4: 11%, 49%, 80%, and 95% of 
the cases and in Grade 5: 3%, 22%, 53%, 
and 81% of the cases. Ji any two Type 
III Curves are now selected arbitrarily, 
one to represent the frequency distribu- 
tion of Grade 4, say with skewness .4 
and the other to represent the frequency 
distribution of Grade 5, say with skew- 
ness .5, it would not be expected that the 
four points a, b, c, d cutting off respec- 
tively: 11%, 49%, 80%, and 95% of the 
area of the curve with skewness .4 could 
be superimposed upon the four points a’, 
b’, c’, d’, which cut off respectively: 3%, 
22%, 53%, and 81% of the area of the 
curve with skewness .5. In general, a’ 
can be superimposed on a, but the other 
three pairs of points would not coincide. 
By the appropriate choice of ratio be- 
tween the standard deviation of each dis- 
tribution, a linear transformation can be 
found which would make b’ fall on b 
while a’ still coincides with a. And then, 
further, by simultaneous appropriate 
changes of the skewness of the second dis- 
tribution and a second adjustment in the 
ratio between their standard deviations, 
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the point c’ can be made to coincide with 
c while at the same time maintaining the 
coincidence of a and a’ and b and JD’. 
Finally, by simultaneously altering the 
skewness of both distributions and the 
ratio between their standard deviations all 
four points can be brought into coinci- 
dence (see Chart C). Hence, exactly four 
points common to the two frequency dis- 
tributions are needed to determine the 
appropriate skewness of each of the two 
grade distributions and the ratio between 
their standard deviations such that the 
area of each curve below each of the four 
selected points on their common abscissa 
is equal to the proportion of cases in the 
data. 

The fitting was done with the aid of a 
set of Salvosa’s©) Tables of Pearson 
Type ILL Function in which areas for 
curves with unit standard deviations are 
tabled for skewness values ranging from 
Oto 1.1. The fitting procedure, an itera- 
tive procedure, was self corrective in the 
sense that the poorer the hypothesized 
choice of skewness the further from coin- 
cidence were the selected points. Hence 
it was possible by successive hypothesized 
values of skewness and successive linear 
interpolations to cause coincidence of the 
four pairs of points to the degree of ac- 
curacy given by Salvosa’s Tables. 

Thus it can be seen that four points 
common to two adjacent grades could be 
selected and a Pearson Type III Curve 
fitted to each grade such that the exactly 
appropriate proportion of cases in each 
grade fell below the specified score and 
the proportion of cases between each 
pair of selected scores used in the fitting 
will be exactly equal to the area of the 
grade curve between the corresponding 
two points. 

The initial criterion under which the 
curves were fitted was that the proportion 
of cases in each grade falling below any 
specific score shall remain invariant after 
the appropriate Type HII Curves have 
been fitted to the overltipping grade fre- 
quency distributions. Hence, the issue 
now becomes one of whether or not other 
common non-fitted score points cut off the 
proper area in each fitted distribution. 
Intermediate points were selected and the 
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Pearson Chi-Square test of “Goodness of 
Fit” was used. 

After the parameters for the two basic 
curves have been determined, other curves 
can be fitted on the same abscissa by se- 
lecting three points common to one of the 
basic curves say Grade 5 and the next 
overlapping curve say Grade 6. Since 
the skewness and standard deviation of 
the Grade 5 curve has been fixed, only the 
skewness and standard deviation of the 
Grade 6 curve remains to be found. To 
obtain values for these two parameters 
only three points are needed. Additional 
curves can be fitted in sequence by re- 
peating the procedure. Equal units then 
are defined as equal distances along the 
common abscissa. 

The method just described has been 
utilized to obtain interval scales in arith- 
metic reasoning, arithmetic computation, 
paragraph meaning, and word meaning 
extending over a range from Grade 2 to 
Grade 9, utilizing scores on the Stanford 
Achievement Tests of approximately 
50,000 cases. The process can be applied 
to many situations in which one wishes 
to transform an ordinal scale into an in- 
terval scale. It is desirable to re-empha- 
size that such an interval scale would have 
been obtained in accordance with a spe- 
cific definition. If the definition is un- 
acceptable or inapplicable, there is no de- 
fense for the scale as an interval scale. 
The use of comparable but less general 
definitions have provided serviceable in- 
terval scales in achievement and aptitude. 

The selection of the overlapping fre- 
quency distributions can be based upon a 
number of different characteristics. In 
addition to grade, age or levels of ability 
or adjustment might be used. Before the 
value of the method can be fully deter- 
mined, obviously, it must be applied to a 
number of situations in education and 
psychology and the stability of the result- 
ing statistics and distributions obtained 
from such units examined. However, the 
method does seem promising as a means 
of obtaining an interval scale which is 
needed for measuring growth and com- 
paring differences in situations where the 
assumption of a normally distributed 
variable is questionable and a skewed dis- 
tribution seems more reasonable. 
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It is useful to distinguish between 
parameters of personality in which varia- 
tion occurs as we go from one individual 
to the next and variables of personality 
in which variation occurs as we sample 
an individual’s behavior at different times 
and under varying conditions. The bulk 
of statistically sophisticated research in 


the field of personality and clinical inves- 
tigation has depended upon an analysis of 
the parameters of personality—upon the 
determination of a single score to identify 


each individual on each measure. This 
may be the reason for the barrenness of 
the psychometric portrait of the individ- 
ual. Barren, that is, until the competent 
clinician organizes with intuition and 
skillful artistry his material with, at best, 
only a dim awareness of the reasons for 
this organization. 

The value of the projective techniques, 
especially the thematic projection tests, 
lies in the fact that the individual is per- 
mitted to impose his own meaning on a 
somewhat ambiguous stimulus and in so 
doing reveals relationships within his pri- 
vate world. At the same time the difficulty 
in the use of such tests is that the dis- 
tinctions between accidental relationships, 
stimulus-bound relationships, and_ per- 
sonally meaningful relationships are hard 
to make and depend, again, on the artistry 
and intuition of the interpreter. To study 
covariation within the individual, it is 
first necessary to study variation within 
the individual. With primary interest in 


the science of experimental methodology 
for the research clinician rather than im- 
mediate application to clinical problems, 
the considerations leading to a method of 
studying intra-individual variation and 
covariation using personality inventories 
are here presented. 

The assumption which has been basic 
to the use of the personality inventory is 
that a given item conveys the same mean- 
ing to everyone who takes the test. For 
every reader (1) the item must suggest 
roughly the same kinds of behavior and 
therefore, barring distortion, (2) a given 
response to that item stands for a cor- 
respondingly unambiguous aspect of per- 
sonality. The frequency with which this 
basic assumption has been called into 
question as a partial explanation of the 
relative sterility of personality inventories 
in predicting an individual’s behavior, has 
led us, in this paper to explore the conse- 
quences not only of denying this assump- 
tion, but even of affirming its opposite. 

We shall therefore proceed on the as- 
sumption that a personality self-rating 
questionnaire is in the nature of a pro- 
jective test: each item (statement for 
self-rating) serves as an ambiguous 
stimulus whose interpretation is affected 
by the subject's needs, wishes, fears, etc. 
This interpretation is expressed in the 
subject’s behavior, that is, by the encir- 
cling of one of the responses provided in 
the questionnaire form. 

Proceeding on this new assumption ab- 
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solves us from special consideration of 
the problem of distortions, willful or 
otherwise, for we are no longer concerned 
with the self-rating as an accurate self- 
evaluation of an individual on a common 
trait as determined either by an armchair 
definition or empirically by a. statistical 
study. The motives that lead the subject 
to distort his self-ratings are those same 
needs, wishes, etc., that we assume operate 
in determining every response. 

Our assumption thus simplifies the 
problem in one direction yet leaves us in 
a curious plight. A filled-out question- 
naire provides us with a number of “bits 
of behavior,’ objective in form, easily 
quantifiable, each a response to a known 
stimulus, but of such barren and super- 
ficial form, that we do not know how to 
interpret them. For these responses 
might be an encircled Yes, ?, or No; an 
encircled rarely, sometimes, or usually; 
or even an encircled number on a scale, 
say, 1, 2, 3, 4, 5. 

We must accept the fact that a single 
response to a given item in isolation can- 
not be understood as an expression of the 
subject’s personality without imposing an 


a priori meaning on the item—an imposi- 
tion which would be contrary to our basic 
assumption, that each subject provide his 
own interpretation of the stimulus. In 
our search for his interpretation of the 
items we could, by considering all the re- 


sponses of a single individual, apply 
Method A, which is an analogue, using a 
single record, of the typical group stand- 
ardization of a test. 

Method A; Classification of the test 
items (original stimuli) according to the 
responses made. If Yes, No consti- 
tute the possible responses one might 
search for a congruence of meaning 
among the items marked Yes; among 
those marked ”; and, again among those 
marked No. In this case one might also 
look for opposition of meaning for the 
items in the Yes group as compared with 
those in the No group. Similarly, if there 
is a wider choice of responses, for ex- 
ample, the numerical scale 1 to 5, one 
would classify the items into five groups 
and look for a gradual change in mean- 
ing in going from the items at one ex- 
treme to those at the other extreme. 


The procedure involved in Method A, 
while conforming to our basic assump- 
tion, makes another assumption which 
may well be as dubious as the one we re- 
jected at the outset. Implicit in the pro- 
cedure is the assumption that the inter- 
pretations of those stimuli which call 
forth the same response are related—pos- 
sibly dynamically equivalent. Though 
this assumption may be sound when the 
choice of responses is as great as it is in 
typical life situations, the narrow range 
of responses permitted in the test situa- 
tion makes it seem of doubtful validity. 
It is interesting to note, nevertheless, that 
we have progressed from an inter-indi- 
vidual standard of consistency of mean- 
ing of a response (the standard assump- 
tion, rejected- at the outset of this paper, 
that the same responses made by differ- 
ent subjects to the same stimulus have 
similar interpretive significance) to an 
intra-individual standard (the assump- 
tion implied in Method A, that the same 
responses made by the same subject to 
different stimuli all occurring in the same 
test situation have similar interpretive 
significance ). 

This assumption of a relation between 
these items which call forth the same re- 
sponse is not unlike a common approach 
to the interpretation of a psychograph: 
those measures on which a subject is 
“high” (or else “extreme”’) are consid- 
ered to form a cluster of interrelated 
variables. The relationship among the 
variables is inferred from the common 
characteristic of unusual prominence. At 
a more involved statistical level such re- 
lationships are inferred only when the 
same combinations of variables occur 
often enough to make for significant inter- 
correlations—in which case clusters or 
factors may be defined and christened. 

In contrast to our problem of the inter- 
pretation of a questionnaire as a projec- 
tive test, the above procedures involve 
inter-individual standards of comparison 
and assume accurate and valid measures 
of common traits, a requirement with 
which we have dispensed. 

To appreciate the subject’s interpreta- 
tion of the questionnaire items we must 
have some basis for grouping these ac- 
cording to his similar or related reac- 
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tions. Our initial premise was _ that 
grouping on the basis of standardized 
scoring derived from inter-individual 
comparisons required the unacceptable as- 
sumptions of similar interpretation by all 
subjects and similar significance of a 
given response for all subjects. A con- 
sideration of Method A, led us to con- 
clude that grouping items on the basis 
of identical (or similar) responses by the 
individual subject offers only a limited 
contribution to the interpretation of the 
personality questionnaires. Another basis 
for grouping items for a single subject 
might be the identity (or similarity) of 
the change in response from a first to a 
second administration of the test. 


Method B: Classification of the test 
items according to the shift in response 
from a first to a second administration of 
the test. li Yes, No constitute the 
possible responses, one might search for 
a congruence of meaning among (1) 
those items to which the response is the 
same in both tests (Yes-Yes, ?-?, and 
No-No), (2) those items for which the 
shift is in a negative direction (Yes-?, 


Yes-No, and ?-No) and (3) those items 
for which the shift is in a positive direc- 


tion (?-Yes, No-?, No-Yes). Or again, 
one might classify the items on which 
there was a shift in response according 
to shift from (1) certainty to uncertainty 
(Yes-?, and No-?), (2) uncertainty to 
certainty (?-Yes, and ?-No) and (3) 
certainty to certainty (Yes-No, and No- 
Yes). The above procedures could be 
used, as well, with a numerical scale. 
Some simplifications would result from 
equating such shifts as 5-4, 4-3, 3-2, etc., 
or again, 1-3, 2-4, and 3-5, and even 
greater simplification from defining only 
the three classes (1) no shift, (2) posi- 
tive shift and (3) negative shift. 

In using Method B we assume a rela- 
tionship between the personal interpreta- 
tions of those stimuli for which responses 
shift in the same direction (and as a fur- 

1. We are not including the possibility of 
questioning the subject in an interview on his 
interpretation of the items, although that ap- 
proach has proved valuable in demonstrating the 
wide variety of interpretation which occurs. If 
the subject was unwilling or unable to give ac- 


curate information, the interview would itself 
require careful interpretation. 


ther refinement, to the same degree). This 
seems reasonable when we consider our 
basic assumption, namely, that the inter- 
pretation of the . ambiguous stimulus 
(test item) is affected by the subject’s 
needs. If there is a change in response 
from one test administration to a second, 
there has been a change in interpretation 
of the stimulus, presumably reflecting a 
change in the intensity of the need or 
needs involved.” If the interval between 
tests is short this probably reflects a tem- 
porary (reactive) shift in intensity; if 
the interval is long, it may reflect either 
a temporary or a fairly permanent shift. 
For both this reason and the one cited in 
the footnote below,” Method B appears to 
be more appropriate with a short interval 
rather than a long interval between tests. 

This interpretation of shifts would not 
be of much practical use with the per- 
sonality inventories now on the market. 
Astonishingly enough, these are too re- 
liable! Such inventories have been built 
up from items that in actual practice tend 
to be responded to inthe same way when 
administered twice. This reliability is 
partially a function of the limited choice 
of responses: Yes, ?, No. In some cases 
it derives from the rarity of the behavior 
described by the item; in others, from 
the close relation between constitution 
and the behavior described. As a result, 
most of the responses (80% or more) 
would remain the same. There appears 
to be a sound basis for the congruence of 
items to which responses change: if there 
is a temporary (reactive) shift in the in- 
tensity of certain needs, we may expect 
those needs which form patterns and are 
related in the personality structure to 
change in some consistent unitary fashion. 
On the other hand, the corresponding 
basis for inferring a congruence between 
items to which responses do not change 
seems much more doubtful, since we 
might reasonably expect to find no tem- 
porary change in several relatively unre- 

2. We are assuming, also, that the change in 
interpretation is quantitative rather than quali- 
tative, an assumption that appears more valid 
when using short intervals (days or weeks) be- 
tween tests during which there are no major 
changes in personality structure, than when us- 
ing long intervals (months or years) during 


which there may be major developmental 
changes in personality structure. 
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lated aspects of personality at the same 
time, especially when (1) the medium 
through which these changes are reflected 
are insensitive to change (reliable items), 
and (2) these aspects have been given 
only one opportunity to change (from 
first to second iest administration). The 
second point might also apply as a criti- 
cism of our interpretation of congruence 
in items to which responses do change: 
namely, unrelated needs might correspond 
in the direction of their temporary 
changes when given only one opportunity 
to change, yet when given a second op- 
portunity to change they might change in 
opposite directions. In other words, for 
Method B to be useful we must have un- 
reliable items (substantial shifts in re- 
sponse from first to second testing) on 
which we obtain reliable patterns of shifts 
(on further testing, those items which 
shift in the same direction from first to 
second test will continue to shift as a 
group). Since our standards of reliabil- 
itv and unreliability must follow our basic 
assumption and derive from intra-indi- 
vidual (or inter-test administration ) com- 
parisons, the only way in which we may 
decide as to how well the data fit our 
requirements is to administer the test at 
least four times, so that two independent 
sets of shifts may be compared. 

If we must administer a personality 
questionnaire four times and still be faced 
with the possibility that Method B is in- 
applicable due to the unreliability of the 
pattern of changing responses, we might 
as well administer it ten, or twenty times 
and study the covariation of responses 
over a respectably long series. This ap- 
proach, Method C, we have named the 
Repeated Questionnaire Technique. 


Method C: 


Technique. 


Repeated Questionnaire 
The test is administered ten 
or more times to the same subject, using 
short time intervals (several days to a 
week). The items are first classified into 
three groups according to the degree of 
fluctuation in response over the entire 
series of test administrations, as follows: 
(a) high variability, defined as the 27% 
of the total number of items with the 
greatest intra-item variance, (b) medium 
variability, the central 46% of the items 


‘liable coefficients. 


HORN 


and (c) low variability (or high consist- 
ency), the 27% of the total number of 
items with the smallest intra-item vari- 
ance. It will be noted that in this group- 
ing an intra-individual standard is set up 
for determining “high” or “low” vari- 
ances. The mean intra-item variance is 
an interesting measure on an inter-indi- 
vidual scale which is worth further study. 

The items in group a (high variability ) 
may be characterized as ‘dynamically 
sensitive,” by which we mean that the 
needs governing the interpretation of 
these items are easily influenced by the 
ordinary course of external events that 
occur from day to day. As a result, our 
sampling technique of test administra- 
tions scattered over a month or more 
catches these needs at varying levels of 
intensity. These dynamically active needs 
constitute an area of personality with 
permeable barriers, to use the Lewinian 
terminology. It is among these items 
that our technique permits the study of 
relationships in terms of covariation of 
the several items. Obviously, items which 
do not vary (group c) cannot covary. 
The practical problems involved in this 
study of interrelationships are many. 
There is the choice to be made of a sta- 
tistical coefficient of relationship. There 
is the determination of the minimal num- 
ber of administrations for achieving re- 
For exploratory pur- 
poses the product-moment correlation co- 
efficient is probably the best choice, but 
under satisfactory conditions there is no 
reason why more elaborate measures of 
variance ratios which correspond more 
closely to clinical concepts of dynamic re- 
lationships should not be used. There is 
also the problem, similar to that encoun- 
tered with Method B, of obtaining re- 
sponses which are extremely unreliable, 
or for our purposes, perhaps we should 
use the word “sensitive” rather than “un- 
reliable.” This can be achieved mechani- 
cally, as studies of rating scales have 
shown by increasing the numerical scale 
to an awkward length (more than seven 
points) and by ambiguity about the stand- 


- ard of comparison which such ratings im- 


ply. 
The items in group b are probably least 
rewarding for the study of relationships, 
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nor are they sufficiently consistent and un- 
varied to be interpreted after the fashion 
of the items in group c. These latter form 
a group of items which, for their interpre- 
tation, depend on needs which apparently 
are not sensitive to the ordinary day to 
day occurrences, and therefore do not 
result in much of a change, if any, over 
the series of tests. These items lend 
themselves to interpretation in at least 
two ways. On the one hand, they may 
represent needs whose expressions are so 
rigid that they are impermeable to out- 
side influences. On the other hand, they 
may represent needs which are so well 
integrated into the core of the personality 
that these also are relatively uninfluenced 
by minor external changes. 


SUM MARY 


On the basis of the above considera- 


IN CODING 47 
tions it appears that a denial of the basic 
assumption in the use of the personality 
inventory, namely, that a given item con- 
veys the same meaning to everyone, and 
an affirmation of its opposite, namely, 
that each item serves as an ambiguous 
stimulus whose interpretation is affected 
by the subject’s needs, results in three 
possible methods for interpreting the in- 
dividual’s past record. On the basis of a 
consideration of the logical implications 
of each of these methods, it appears that 
Method C, the repeated test technique, 
offers the most promise for a meaningful 
approach to the interpretation of person- 
ality inventories. A more general appli- 
cation of this kind of approach to other 
materials turns, in effect, each individual 
into a statistical research problem for the 
clinical definition of his personal struc- 
ture in his own terms. 
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INTRODUCTION 


The transformation of qualitative data 
obtained in interviews, autobiographies, 
free-answer responses to open-ended 
questions, projective materials, and ob- 
servation of group situations into a form 
which renders them susceptible to quanti- 
tative treatment constitutes coding. The 
clinician and social psychologist increas- 
ingly use coding procedures to obtain 
more rigorous statistical demonstration of 
their hypotheses. Little systematic think- 
ing has been done about coding ; this paper 
is an attempt to make preliminary formu- 
lations. 

* Publication No. 1 of the Conference Re- 
search project at the University of Michigan 
sponsored by the Office of Naval Research 
(Contract N6onr-232, T. O. 7), under the gen- 
eral direction of Dr. D. G. Marquis, chairman of 
the Psychology Department. The writer is 
grateful to his colleagues, Drs. A. Campbell, 
C. Coombs, and C. C. Craig, for aid in develop- 
ing this paper. 


The coding of qualitative data involves 
two operations, that of separating the 
qualitative material into units, and that of 
establishing category-sets into which the 
unitized material may be classified. The 
fruitfulness of the transformation depends 
upon the ingenuity and insight with which 
the experimenter chooses his units and 
category-sets. The reliability of the cod- 
ing depends upon the accuracy with which 
the unitizing and subsequent classifying 
are carried out. This paper first will con- 
sider general characteristics of units and 
category-sets which may be helpful in for- 
mulating coding schemes. Then it will 
present procedures to evaluate the reliabil- 
ity with which the coding has been done. 


CHARACTERISTICS OF CATEGORY-SETS AND 
Units 

The development of a set of categories 

into which the qualitative material may be 

classified is always accompanied explicitly 

or implicitly by a decision as to the size of 
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the unit into which the material shall be 
divided before it is categorized. Yet, se- 
lection of unit size seems more dependent 
upon the category-set employed than 
choice of category-set depends upon unit 
size. Thus, for purposes of exposition, it 
is convenient first to discuss characteris- 
tics of category-sets and to develop a vo- 
cabulary in terms of which category-sets 
may be described. 


Category-sets. A category-set consists 
of a number of classes or “pigeon holes” 
into which the units of qualitative data 
may be placed. When the category-set is 
intended to provide for classification of 
“ach and every unit of the data, it may be 
termed exlaustive. Sometimes residual 
categories, such as “not ascertained,” “no 
answer,” or “don’t know” must be in- 
cluded in the set to make it exhaustive. 
Rosenzweig’s classification of the direction 
of aggression in responses to his picture- 
frustration test?) provides an exhaus- 
tive coding schema, in that each unit may 
be characterized as intropunitive, extra- 
punitive, and impunitive. His category- 
set becomes exhaustive by supplementing 
a dichotomous set (intropunitive vs. ex- 
trapunitive) with a residual category (im- 
punitive ). 

Sometimes category-sets make no at- 
tempt to classify all of the units into a 
single schema and are far from being ex- 
haustive. The Survey Research Center of 
the University of Michigan designates 
such as sieve codes, because they act as a 
straining device by which the entire bulk 
of the qualitative data is combed for cer- 
tain infrequently appearing items. A sieve 
category-set is sometimes applied to inter- 
view data, when the analyst wishes to ob- 
tain information which was elicited casu- 
ally, as material tangential to the particular 
question asked. For instance, questions to 
farmers about crop plans in an agricul- 
tural survey may be sieved for value- 
oriented comments about government poli- 
cies. There is no fundamental difference 
between a sieve code and an exhaustive 
code. In using a sieve code the analyst is 
merely not making formal acknowledg- 
ment of a large residual category, “no 
mention.” 

The categories within a set are on occa- 
sion all derived from a single frame of 


reference. When such is the case, the set 
may be considered as_ uni-dimensional. 
When developing a code, new categories 
are added sometimes to handle each non- 
comparative datum obtained. This tends 
to make the set multi-dimensional. It is 
important to avoid modifying a schema by 
introducing a new dimension when adding 
new categories during the early stages of 
code refinement. In constructing the fina! 
frame of reference into which the cate- 
gories are placed, it is necessary to work 
back and forth from the material to the 
categories and to the frame of reference. 

Sometimes an ostensibly uni-dimen- 
sional category-set actually consists of two 
or more sets. For instance, Staton@), 
in analyzing the statements of individuals 
in seminar discussion used the following 
schema : 


Introduces point, opinion, or idea. 
Develops, interprets, or corroborates 
point, opinion, or idea. 
Agrees. 

Challenges or questions 
point, opinion or idea. 
Asks elaboration of point, opinion or 
idea. 

Asks approval of point, opinion, or 
idea. 

Cites factual information or example. 
Asks for information, 

Invites suggestion. 

Calls on individual. 

Summarizes or re-words. 

Suggests procedure. 

Challenges or questions procedure. 
Defines problem conditions. 

Is interrupted. 


l. 
a 


value of 


There are a number of frames of reference 
employed in this set. For instance, Cate- 
gory 15 differs from all the others in that 
it describes whether the participant had 
opportunity to complete his contribution ; 
a statement classified into any of the four- 
teen other categories might also be inter- 
rupted. A third frame of reference is 
easily found in the difference between 
Categories 12 and 13 which are concerned 
with procedural statements, as contrasted 
with the remaining categories (except 15) 
which are used to describe substantive 
statements. Categories + and 13 are iden- 
tical, except for this difference; Cate- 
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gories 1 and 12 hold a similar relation- 
ship. The fact that no categories parallel 
the substantively oriented Categories 2, 3, 
5, 6, 7, 8, 9, 11, and 14 means those classes 
would contain a mixture of procedural 
and substantive statements. There are at 
least two additional frames of reference 
used in the development of the list of 
categories, making a total of five dimen- 
sions within the system. Such multi-di- 
mensionality of a category-set makes the 
coding difficult, and promotes the inter- 
mixing of “double-coding” or “triple- 
coding” of some units and “single-coding” 
of other units. This multiple coding of 
single units introduces complications in 
the subsequent quantitative interpretation 
of the classified results, as comparisons 
between frequencies in the classes are 
then based upon varying numbers of units. 
This difficulty may be reduced ,by_ estab- 
lishing an additional second or even third 
category-set which is derived from the 
frame of reference. Then each unit would 
be coded into each category-set, with a re- 
duction of the confusion. This simul- 
taneous application of more than one ex- 
haustive category-set to each unit of data 
may be designated as mu/tiple-coding. An 
example is found in Rorschach’s use“? 
of three category-sets in scoring each re- 
sponse to ink-blots on the basis of its lo- 
cation (W, D, Dd, and S$), its determi- 
nants (F, C, CF, FC, M) and its content 
(H, A, Obj, Lads, ete.). 

One of the causes of multi-dimension- 
ality of a category-set is to be found in 
the mixing of categories of differing levels 
of generality. Under each category within 
a set, it usually is possible to construct 
another set of more specific categories. In 
practice when many “not classifiable” 
units are obtained, the analyst may look 
for another, all-inclusive frame of refer- 
ence for the construction of a supraordi- 
nate category-set which is more exhaus- 
tive. The complexity of the category-sets 
used will depend upon the amount of data 
which is to be handled and the hypotheses 
being explored in the analysis. 

It is sometimes proposed that the most 
effective compromise between the desire to 
limit the number of categories and the 


wish to retain the full 


richness of be- 


havioral responses can be made by using a 
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large number of separate categories dur- 
ing the coding and combining them into 
broader classifications in subsequent anal- 
ysis steps. This procedure is actually much 
less satisfactory than it would appear, 
since the combining of categories which 
have been devised with no logical inter- 
relation leads to unnatural and unwieldy 
combinations for which proper definition 
is very difficult. In Lazarsfeld’s treat- 
ment’) of the problem of category pro- 
liferation, he demands that the analyst 
keep foremost the purpose for which the 
qualitative material is being quantified. 
cach category then must prove its con- 
ceptual meaningfulness in terms of the 
category-set’s frame of reference before it 
is admitted to the schema. Another help- 
ful rule of thumb is to begin with a unify- 
ing frame of reference in terms of which 
only large, general categories are admitted 
into the schema. Then as it hecomes ap- 
parent that reliable and meaningful dis- 
tinctions can be made, the -original cate- 
gory areas may be re-worked, readjusting 
the scope of each category, so that its bor- 
ders are less fuzzy. 

Subcategory-sets may be articulated as 
specifications of the category itself. These 
subcategory-sets may be dropped if they 
prove to be too minute to carry enough 
occurrences to warrant the © additional 
breakdown, Whenever richness of the 
data is lost by the “lumping” of many 
units into a single broad category, the 
simultaneous employment of subcategory- 
sets along with the exhaustive, supraordi- 
nate category-set is advantageous. The 
subcategory-set need only be exhaustive 
of the units classified within its parent 
supraordinate category. 

Category-sets may be distinguished 
from each other on the basis of the inter- 
relationships of the categories to each 
other within the set. Some categories 
blend into each other. They can be ar- 
ranged on a continuum in order of magni- 
tude. The category-set then constitutes a 
scale, When the categories within a 
set cannot be so arranged, they may be 
designated discrete categories. Some cod- 
ing schema resort to mixed categories to 
handle the problem of overlapping cate- 
gories, as Rorschach’? did in establishing 
CF and FC categories in addition to the 
Form, Color, and Movement categories 
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within his determinants 


gory-set. 


response cate- 


Unitizing. The selection of the amount 
of material to be included in each unit is 
dependent upon two factors; the way in 
which the qualitative material has been 
gathered and the demands imposed by the 
category-set to be used in classifying. Be- 
cause unitizing is intimately tied up with 
the data collection procedure and the sub- 
sequent classification process, the opera- 
tion sometimes is not recognized as a prob- 
lem in coding. For instance, analysis of 
open-ended interviews often directly pro- 
ceeds by using each question-response as 
a unit. However, when a question asking 
the reasons for particular attitudes is 
asked, the coder then begins to recognize 
units. He classifies each “reason” unit as 
a separate entity. In the Rorschach test, 
the problem of unitizing at times becomes 
quite difficult, when separate responses 
are not clearly distinguished by the sub- 
ject© PP. 65-76) Material obtained by con- 
tinuous observation of verbal and non- 
verbal behavior, as illustrated by Stein- 
zor@?) in his observation and recordings 


of group therapy sessions, is very difficult 


to unitize. Thus, the extent to which the 
data collection procedure structures the 
units is an important determinant of the 
size used in coding. 

The demands made by the category-set 
in the classifying process are also impor- 
tant determinants. If the size of the unit 
is not appropriate to the category-set, 
difficulty in classifving the unit materials 
is increased. For instance, when the unit 
is too large, the material may consist of 
two sub-parts which are classifiable into 
different categories. This condition makes 
the choice of category slow and promotes 
ambiguity in the meaning of the category 
itself. If the unit is too small, the amount 
of material placed in the residual category, 
“not ascertainable,” increases. To correct 
these tendencies, the size of the unit may 
he made more appropriate to the demands 
of the category-set. 

The size of the unit governs the fre- 
quency with which repeated items occur- 
ring in close proximity to each other are 
classified as separate events. For instance, 
White“) employs the sentence as the unit 
in his value-analysis of autobiographical 


materials. Although a given value may 
be mentioned more than once in a sen- 
tence, the sentence is tallied only once for 
this value. When a given value is men- 
tioned in three out of the four sentences 
constituting a paragraph, three tallies are 
made in the given category. In his analvy- 
sis of typescripts of conferences, Heyns‘>? 
found it useful to allow a larger “problem- 
solving function’ unit determine the fre- 
quency with which closely reiterated *1” 
and “we” references were counted. He 
first unitized the typescript with respect 
to functional units. Then he determined 
whether at least one reference to the indi- 
vidual speaking (“I’’) or to the group 
(“we”) occurred within a unit. In this 
way, the spurious effects due to verbal 
circumlocutions (“I think that I would 
5; *) and hesitations (“IT would... I 
would ...”’) are eliminated. Use of a unit 
larger than the item being classified at 
times helps in establishing a psychologi- 
cally more meaningful code. 

The size of units determines the molec- 
ular-molar level at which behavior is 
analyzed. Units may be very small and 
atomistic, as are demanded in the classi- 
fication of vowel and consonant sounds in 
the study of speech” behaviors®’. They 
may be very wholistic and global, as when 
french’? classified the over-all “interde- 
pendence” of groups. Sometimes the be- 
havior is coded into categories using units 
of one size, and then these units are 
summed to characterize larger units. This 
procedure has been commonly used by 
genetic psychologists“) in their observa- 
tions of children’s behaviors. The units 
of behavior occurring within various cate- 
gories are summed for given individuals 
over given periods of time. For instance, 
the frequency of specific behavioral acts 
during each five minutes has often been 
used to describe behavior, For purposes 
of analysis, further super-units are intro- 
duced by separately summating the fre- 
quency of category units for individuals 
within certain time periods. This simul- 
taneous use of a number of unit levels to 
make the richness of the data available is 
parallel to the simultaneous use of a num- 
ber of category-sets at different levels of 
generality, as described above. 
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RELIABILITY 


Once insightful methods of coding have 
been adopted, the experimenter faces two 
problems ing reliability. He must deter- 
mine the accuracy with which the coding 
procedure has been applied to the data, 
and he must test the extent to which his 
sample of data is sufficiently large and ade- 
quately chosen to yield reliable characteri- 
zation of the behavior being studied. This 
latter problem has been well explored in 
studies of sampling. This paper will con- 
cern itself only with the first problem 
namely, the accuracy with which the cod- 
ing schema is applied. 


Reliability of Categorising. When two 
coders have classified a given number of 
units of qualitative material into a set of 
categories, it is possible to compute the 
proportion of items upon which the coders 
agree. From this proportion, experi- 
menters would like to estimate the ac- 
curacy with which the units have been 
classified. The proportion of units upon 
which two coders agree may be con- 
ceived as the sum of those items which 
both coders correctly classify and those 
items which both coders incorrectly clas- 
sity in the same incorrect way. If peu 
is the probability with which a coder cor- 
rectly classifies any unit, then the 
probability of two coders (c = 1, or 2) 
correctly classifying the unit is pyupou. 
If q = 1 — p, the probability of a coder 
incorrectly classifying a unit is q,y. The 
probability that the other coder will incor- 
rectly classify the unit in the same wrong 
category is ‘a . if k represents the number 
of categories in the set. The joint probabil- 
ity of both coders classifying the unit in the 
Qiu Gou 
the proportion (P) of n units upon which 
two coders will agree may be written, 


same incorrect category is Thus, 
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which may be simplified to 
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Equation (2) holds under the assumption 
that classification of any one unit is inde- 
pendent of the classification of any pre- 
ceding or subsequent unit. 

If the probability of correct classifica- 
tion is assumed to be the same for all units, 
Peu is the same for all units. Then equa- 
tion (2) may be simplified to the follow- 
ing form: 
me. 


a: 


1s 
p= ky Ps Ps Pp, — P,) (3) 
If the two coders have the same ability 
(p, = p.), equation (3) reduces to a 
simple quadratic, 
k = "hae, (4 
>= pt — - ) 
eB gua * cin 


k-1 
Equation (4) holds at the extreme 
limits of coding ability. When the coders 
are unskilled and randomly classify the 


units into the categories, p equals Sub- 


stituting —- for p in Equation (4), P be- 


XK 


en 
comes ° When the coders are completely 


accurate, p equals 1, as does P. 

Note that the first term in Equation 
(4) increasingly dominates the determi- 
nation of P as k becomes larger, that is 
when the number of categories is large. 
The same condition prevails when rela- 
tively skilled coders are employed, that is, 
when p is large. When k is greater than 
5, increases in the number of categories 
have a relatively small effect upon the 
amount of agreement obtained between 
coders of constant ability. The graphs of 
Kquation (4) presented in Figure 1 for 
various values of k vividly illustrate this 
generalization. This relationship is of 
practical importance to the experimenter. 
It indicates that agreement between coders 
in category-sets involving more than five 
categories is largely determined by the 








HAROLD GUETZKOW 


CATEGORIZING RELIABILITY 


| 
| + + 
| 
+ 


|| | RELA 
el A 


(Lowzgrn timit) 


0 


i 
2 
wW 
2 
wW 
wW 
a 
°o 
< 
wo 
°o 
2 
° 
e 
4 
° 
a 
fe) 
4 
a 
a 
< 
9° 
re 
WJ 
« 
°o 
wW 
=x 
- 


THEORETICAL CORRECTNESS OF CLASSIFICATION (Lower Lim) 


Ficure 1 


wobability of both coders selecting the agreements between coders actually ob- 
is . .- ©, . . — 
“correct” category. The proportion of — tained by an experimenter (P’) is a single 
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estimate of the corresponding theoretical 
agreement (P). However, the expected 
limits of the theoretical proportion of 
agreement can be calculated through a 
t-test of the difference between the P and 
I” namely 

e— ee. , where Q=1+ P= (5) 

PQ 
\ n 

This quadratic equation may be solved for 
P so that a general expression for the 
range of P corresponding to a given P”’ for 
different values of n and t is obtained, 
namely 

t? + 2nP’ 
P= 

2(t? + n) 


\ (t? 


+ 2nP’)? 


4(t° + 


+ n) 


n)n(P')? 
+ 

2 (t? 
(6) 


The values of all the variables in the right- 
hand member of the equation are known 
by the experimenter. For any given value 
of P’, the lower limit of P can be caleu- 
lated for various magnitudes of t and n. 
The relationships between P’ and P, and 
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n are presented in Figure 2 for t's at the 
1% and 5% levels of significance, where 
t = 2.58 and 1.96. 

A practical example of the use of Equa- 
tions (4) and (6) follows.. Suppose an 
estimate of the lower limit of p at the 1% 
level is desired, when the proportion of 
agreements between the two coders was 
80 on 100 units of qualitative data in a 
category-set consisting of 10 classes. By 
substituting P’ = .80, t = 2.58, and n = 
100 in Equation (6), the lower limit of P 
is found to be .68. By substituting P = 
.68 in Equation (4), the least value p 
might take 99 out of 100 times is found to 
be .82. 

Figures 1 and 2 can be used to obtain 
a graphic estimate of the value of p as fol- 
lows. In Figure 2, the curve labeiled P’ = 
80 is used. With n = 100 on the abscissa 
for the 1% level scale, P is read on the 
ordinate as .68. Then going to Figure 1, 
the curve for k = 10 is used. With P on 
the ordinate at .68, the value of p on the 
abscissa is found to be .&2. 

What effect does the number of units 
classified have upon the estimates of p? In 
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general, the larger number of units used 
as a basis for calculating P’, the larger are 
the minimal estimates of p corresponding 
to given values of P’. The exact relation- 
ships may be obtained by solving Equation 
(5) for n as a function of P. The results 
are of practical consequence, as inspection 
of the Figure 2 suggests that increases 1n 
the lower limit values of P are not very 
rapid when more than 150 units are used 
for obtaining P’. Hence, experimenters 
need not have more than 150 units of the 
qualitative material classified by two 
coders to obtain stable estimates of the 
probability with which each unit is clas- 
sified correctly. Sometimes the coding 
operation extends over a long time period, 
with possible shifts in the standards of the 
coder. Under such circumstances, periodic 
checks of masses of 150 units are needed. 

If three coders classify the same quali- 
tative data into a category-set consisting 
of a relatively large number of categories, 
it is possible to estimate the abilities of 
each coder. To do this, Equation (3) must 
first be simplified and then a set of simul- 
taneous equations derived. Equation (3) 
may be simplified by subtracting the quan- 
Pi Pe 
k-1 
the same quantity to the second term of 
the right-hand member of the equation, so 
that it becomes 


tity from the first term and adding 


(1 p,) (1 — p,) 


k-1 (7) 


When a large number of categories are 
used, the second term of the right-hand 
member of Equation (7) may be disre- 
garded, so that 

(8) 


IXquation (&) may be written for the three 
combinations of coders, as P’,, P; Pe: 
i PD, P.; and P’., = pyp,. These 
three equations may be simultaneously 
solved for the following estimates of coder 
reliability : 


By using four or more coders, it is pos- 
sible to write more observational equa- 
tions (similar to those just presented) 
than there are unknowns. This would 
enable an empirical check to be made on 
the assumptions and approximations un- 
derlying Equation (3). 


Reliability of Unitizing. There are two 
types of errors likely to be made by coders 
in unitizing massed, verbal material. (1) 
A given length of material may be broken 
into units at different points, such that the 
units obtained by each coder are equal in 
number, but the units may not be coter- 
minous. (2) Sometimes one coder may re- 
gard a given segment as containing two or 
more units, while another coder may re- 
gard the same segment as a single unit. In 
this case, the number of units obtained by 
the coders will not be equal. The process 
of unitizing may be likened to the problem 
of breaking a long chain of beads into 
short chain segments. The units within 
the chain may be thought of as being beads 
of different colors, which shade into each 
other. When the change of colors of a 
segment of the chain is abrupt, the prob- 
lems of distinguishing one unit from 
another is easy and few errors occur. 
When the change is gradual with the mid- 
segment color clear and unambiguous, the 
first type of error tends to appear, in that 
two coders will sever the segment at dif- 
ferent points but end up with the same 
number of segments. When the change 
from color to color is very gradual and 
there is some variation of color even 
within the segment itself, the coder may 
sever the segment in its middle, obtaining 
two rather than one units. In making this 
second type of error, each coder ends up 
with a different number of segments. 

In practice, errors in the number of 
units obtained are of more significance 
than absence of coterminability, as the 
latter less often leads to errors in the clas- 
sification of the response in the subsequent 
categorizing process. When two coders 
unitize a given bulk of material, compari- 
son of the number of units obtained by 
each may be made. This is not an exact 
measure of the second type of error, as 
there may be some compensation of 
double-units in one segment of the ma- 
terial by failure to separate two other seg- 
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ments. However, it serves as a practical 
approximation to the amount of error 
present, especially if one is willing to as- 
sume that such compensating errors are 
related in no systematic way to the subse- 
quent categorizing process. Comparison 
of the number of units obtained by the two 
coders constitutes a basis for evaluating 
the reliability of the unitizing. 

The reliability of the unitizing process 
(U) may be characterized by expressing 
the difference between the coders as a per- 
centage of the sum of the number of units 
obtained by each coder, that is, 

0, — 0 


U= 1 ~ 3 : 
0, + 0, 


(10) 


if O, represents the number of units ob- 
tained by the first coder, and 0, is the total 
obtained by the second coder. One ap- 
proach to the meaning of various magni- 
tudes of this ratio is found in an applica- 
tion of Geary’s frequency distribution of 
the quotient of two normal variates“). 
For this purpose, Equation (10) is con- 
ceived as the quotient of two variables, by 
letting O, — 0, = Y, and 0, 4+ 0, = X. 
Before applying Geary’s distribution 
function to the ratio of Y to X, it will be 
found useful to explore the properties of 
Y and X and make a number of simplify- 
ing assumptions. It is convenient and rea- 
sonable to assume that the two coders are 
of equal ability. This implies that the ex- 
pected long run value of 0, and 0, upon 
repeated trials would be equal to each 
other, 7.¢., 
E (0.) =E 


(11) 


where h is some constant, and that the 
variances of the distributions of the 0,’s 
and Q,’s are equal, i.e., 


(0,) 


With the above conditions 
E(Y)=E 
E (X)=E 


Inasmuch as o 
x 


(0, — 0,) = 0, and 


(0, + 0,) 


and O, are assumed to be independent of 
each other, then taken with Equation (3) 
above, 


ahi = < 
oy, = vy -o V é. (15) 
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In order to use Geary’s distribution, it is 
necessary to assume that 0, and O, are 
normally distributed. This condition is 
probably met in practice inasmuch as it 
demands a normal distribution of unitiz- 
ing errors upon repeated application of the 
procedure. 

In the notation used above, Geary’s fre- 
quency distribution of the quotient of two 
variables is 
h U 


> ; 
V 2 centers 
"vito (16) 
This quantity is approximately normal, 
with a mean of zero and unit variance. 
The distribution holds only when the ratio 
ss et: ae 
V2 is large. In the unitizing problem, 
Co 
this assumption is warranted, as the aver- 
age value of h is always much larger than 


in (16) 


; : par Bag h 
o for trained coders. The factor 
o 


represents the accuracy of the coders. The 


reciprocal of — is a Pearson coefficient of 


o 

variation, approaching a value of zero as 
the ability of the coders to unitize ap- 
proaches perfection. For this reason, it is 
o 
h 

Geary’s distribution is expressed in the 
usual form of the normal binomial distri- 
bution curve and approximates it. Hence 
(16) may be set equal to “t” and utilized 
in estimating the possibility of obtaining 
particular values of U for given values 


conceptually convenient to work with 


of This equation is of practical value 
in estimating the coder accuracy which is 
associated with obtained values of U. For 
instance, if the experimenter keeps the 
o a. i ‘ 
= .05, U will 
1 

exceed .09 by chance only once each 100 
times, and .07 only once each 20 times. 


Because of the compensating nature of 
the errors described above in the second 


accuracy of his coders at 


paragraph of this section, the value of 


is dependent upon the amount of the ma- 
terial upon which the empirical value of 
U is based. If the coding from which the 
U value is obtained approximates 100 
units of material, increasingly small esti- 
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mates of U may be obtained by doubling 
or tripling the amount of material. 

The implicit influence of this factor in 
(16) can be taken into consideration by 


c 1 ‘a 
writing factor as follows: 


Nh hyN 


Yr o (17) 


ee 
where N is the number of such masses of 
approximately 100 units. The denominator 
in factor (17) was obtained from the con- 
sideration that the estimate of the com- 
bined standard deviation is obtained by 
adding the variances of each mass being 
combined. Inasmuch as the standard de- 
viations for each mass may be assumed to 
he equal, the more complicated expres- 
sion for the combined standard deviation, 


o” becomes ao 'VN. 
N 


The numerator of factor (17) is merely 
N times the number of units in each mass, 
which again are assumed to be equal in the 
long run for each mass. Now (16) may 
be rewritten with the influence of the 
amount of material check-coded made ex- 
plicit, as follows : 

hyN U 


co WV < > 


Vy 1i+vU? (18) 
The influence of the bulk is shown to de- 
crease the value of U for increases in N, 
if t is to remain constant. For given 


— n is 3 S 3 
values of and N, the lower limits of | 
1 


at the 5% and 1% levels of significance 
may be calculated from Equation (18) 
and are presented in Figure 3. Note that 


for values of .20 there are relatively 
small changes in U when N is equal to or 
greater than 3. This finding is of con- 
siderable practical importance. It indi- 
cates that increases in the quantity of the 
material unitized beyond three times the 
original quantity will result in little im- 
provement in the accuracy with which the 
evaluation of unitizing reliability would be 
made. 

Figure 3 can be used to obtain a graphic 
evaluation of obtained values of U. Sup- 
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pose an experimenter finds U in one 
mass of his material to be .10. Using the 
abscissa for N on the 1% level scale, lo- 
cate the point (U = .10, N = 1). This 


point falls slightly below the = .05 line. 
y ’ 


The experimenter may conclude that the 
value of U as high as .10 in the unitizing 
coding would have been obtained only 


05. Had 


value of 


o 
once each 100 times when h = 
i 
U been .30, the corresponding 1% 


oC 
j needed to obtain a U as high as .30 
1 


would be approximately .15. 
SUMMARY 

The transformation of qualitative data 
obtained in interviews, autobiographies, 
free-answer questions, projective mate- 
rials, and typescripts of group meetings 
into a form which renders them sus- 
ceptible to quantitative treatment consti- 
tutes “coding.” Coding procedures in- 
volve two operations, that of separating 
the qualitative material into codable units, 
and of establishing systems of categories 
which can be applied to the unitized ma- 
terial. Generalizations about the construc- 
tion of category systems and the use of 
unitizing operations were made. It was 
possible to derive reliability estimates of 
both operations. These estimates also aid 
the investigator in deciding how large an 
amount of data needs to be check-coded 
to insure the desired level of accuracy. 
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INTRODUCTION 


This paper deals with the classification 
of an individual into one of two or more 
categories, on the basis of observations 
made on the individual together with a 
knowledge of the statistical distributions 
of the observed quantities for individuals 
within each of the possible categories. 
R. A. Fisher“) was responsible for the 
discriminant function technique for Gaus- 
sian distributions with common variances 
and covariances. The theory of Neyman 
and Pearson’) provides a best discrimi- 
nator in general when there are two cate- 
gories, and when the set of observables is 
specified, but leaves open the choice of 
threshold value for the discriminator. 
Moreover, the choice among alternative 
sets of observables cannot in general be 
made on the basis of Neyman-Pearson 
theory alone, nor does this theory pro- 
vide a complete procedure for classifica- 
tion where there are more than two cate- 
gories. The Neyman - Pearson solution 
for the case of two categories seems not 
too widely known, except among mathe- 
matical statisticians. However, see Pen- 
and Smith “). It is the purpose 
of this paper to give added publicity to 
that theory and to implement the theory 
by providing practical grounds on which 
the choice of threshold and the 
among alternative of observables 


may I:xtension of 


rose! 3) 


che ice 
sets 
sometimes be made. 


these considerations is made to the prob- 
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lem of classification into one of several 
categories. 

For purposes of concrete illustration 
consider the situation described by Berk- 
son®), where candidates for pilot train- 
ing were subjected to batteries of psycho- 
motor aptitude tests, the results of which 
were combined to give a single score on 
the basis of which the candidates were 
accepted or rejected for training. As 
Berkson points out, all the candidates 
may be allowed to enter training during 
the experimental phases of the selection 
program. On the basis of information 
about the candidates who are graduated ° 
and those who are not, the selection 
criteria must be constructed, choosing 
among alternative batteries of tests, 
choosing the method of combination to 
give a single score, and choosing the 
threshold for the single score. There are 
diagnostic problems of this type in clini- 
cal work, although some diagnostic prob- 
lems are not of this form because no 
final diagnosis can be made with cer- 
tainty at any date after the initial diag- 
nosis. 

It is clear that the experimental phases 
described by Berkson provide, for any 
particular selection method, estimates of 
the probability that a potential graduate 
will be rejected, and of the probability 
that a future washout will be accepted. 
Furthermore, the experimental period 
provides estimates of the probability that 
a member of the unselected population 
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will be graduated. It is this last informa- 
tion, together with cost figures relating 
the undesirability of rejecting a potential 
graduate to the undesirability of accept- 
ing a future washout which permit the 
choice of threshold value and the choice 
among alternative batteries of tests. 


Tne NEYMAN-PEARSON CONSTRUCTION 


Let two populations, A and B, have 
specified multivariate distributions given 
by faCx,y,z, .) and fp(x,y,z, . . .), 
respectively, for the joint distributions of 
the quantities x,y,z, , observable for 
individuals in either A or B. The dis- 
tributions may be continuous, in which 
case fy, and fy, will be densities, or dis- 
crete, in which case f, and f, will be 
probabilities, or they may be of mixed 
structure, continuous in some. variables, 
discrete in others. It is supposed that 
f, and fy are known functions, and that 
a set of values (x,y,z, .) has been 
obtained for an individual. On the basis 
of these values the individual is to be 
classified as being a member of A or a 
member of B. The likelihood ratio 
A= (x98, :.. Asad. is Oe 
been shown by Neyman and Pearson 
to be an optimum discriminator, in a 
sense which will be discussed below. 
Large values of A correspond to associa- 
tion of the individual with B, small values 
correspond to A. If a threshold A, is 
chosen, B is indicated if A > A,, A is in- 
dicated if A= A,. It should be noted that 
when f, and fy are densities correspond- 
ing to Gaussian distributions with the 
same matrix of variances and covariances 
but different means, then the logarithm 
of the likelihood ratio A is a linear func- 
tion of the conventional discriminant 
function. 


Consider now any suggested rule, say 


R, for associating A or B with an in- 
dividual on the basis of (x,y,z, . . .). 
There will be, for an individual chosen 
at random from A, a probability P(A; 
A,R) of correctly classifying such an in- 
dividual into A. P(B;A,R) 1-P(A; 
A,R) is the probability of incorrectly 
classifying such an individual into B. 
Similarly P(A ;B,R) is the probability of 
classifying (incorrectly) a random mem- 


ber of B into A, and P(B;B,R) = 1-P 


(A;B,R) is the probability of classifying 
(correctly) such an individual into B. 
The dependence on R exhibits explicitly 
the dependence of these numbers on the 
rule of association.' 

It is intuitively clear that good dis- 
crimination will require that P(B;A,R) 
and P(A;B,R) both be small, in some 
sense. The optimum property of the 
likelihood ratio A, with any threshold Aj, 
is that no other rule can have P(B;A,R) 
smaller than for the rule corresponding 
to A>Ao, without having P(A;B,R) 
larger, and conversely. It is on this basis 
that A is an optimum discriminator, pro- 
vided that the set of observables (x,y,z, 

.) is already specified. Note that 
this optimum property is not concerned 
with the value of A,. It can be concluded 
so far only that we can safely confine our 
attention to rules of the form A > Ap. 

Another way of putting the optimum 
property of the rule A >A, is that for all 
rules R the quantity P(A;B,R) + A, 
P(B;A,R) is a minimum when the ruly 
A>A,. is adopted (see?)). It appears 
then, that the rule A>A, minimizes a 
weighted sum of the two probabilities of 
misclassification, with A, measuring the 
relative importance of P(B;A,R)_ to 
P(A;B,R). 


MINIMIZING Expectrep Cost 


It has already been pointed out that 
good discrimination requires that P(A; 
B,R) and P(B;A,R) both be small. 
Presumably one can generally arrive at 
estimates of the degrees of undesirability 
of misclassifying an “A” as a “B” and 
of misclassifying a “B” as an “A”. In 
this context the word “cost” seems appro- 
priate. Denote by W, the cost incurred 
by misclassifying an “A” as a “B,” and 
by W, the cost incurred by misclassify- 
ing a “B” as an “A”. In Berkson’s pilot 
training example, W, would be the cost 
associated with the loss of a potential 
pilot and W,, would be the cost associated 
with training a washout. If the discrimi- 
nating rule R is used, the quantity W, 

1. If A is the class of potential graduates 
and B is the class of potential washouts, in 
Berkson’s example, then P(B;A,R) is Berk- 
son's “Cost” and P(B;B,R) is his “Utility.” 
These terms are not used here because the term 
“Cost” is desired for another purpose. 
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P(B;A,R) is the expected cost for a 
member of the class A, while W, 
P(A;B,R) is the expected cost for a 
member of class B. Wald refers to 
these as the “risks,” for individuals of 
categories A and B respectively. 

Having attached to individuals of the 
two categories an expected cost, or risk, 
we could combine these two costs if we 
knew the probability p that a random 
member of the unselected population is 
an “\.” An estimate of p is available, 
in the pilot selection example, from the 
experimental period experience with un- 
selected candidates. For a random in- 
dividual, then, the expected cost W(R) 
is given by 
WR) pW, P(B;A,R) +4 

(1-p)W,,P(A;B,R) 
but 


- f 
WR) 


(1-p) Wy, 2 PCA;BLR) 


W 
pe A__P(B;A.R) 
(i-p)W,, 


‘J * 
The results of the previous section tell * 


us, then, that W(R) will be a minimum 


if the rule A >A, is adopted, with A, 
pWa4/(1-p) Wa. 


The method given above permits a 


choice of A, to be made, whenever the 
fraction p of “A’s” in the population is 
known. Furthermore, it permits a choice 
among alternative sets of observables. 
For, suppose the observables x’,y’,z’, . 
are available as alternatives to x,y,z... ., 
with distributions given by ga(x’.y.z’. 
.) and ga(x’,y’.z’, . . .), then the 
minimum W(R) can be calculated for 
the x’,y’,z’ as well as for the x,y,z, 
., comparative costs of obtaining 
ff ¢ . and x,y,z, . can be esti- 
mated, and a rational choice can be made. 
The choice of the rule A>A,, with A, 
pW,/(1-p)W,, meets our intuitive 
notions about the directions of the effects 
of various changes in the parameters. 
Note that larger A, values mean higher 
probabilities of classifying individuals 
into A. It is quite reasonable that higher 
values of p or higher values of W, 
should lead to classifying more individu- 
als into .\. In the extreme cases, if p 
1, there are no individuals in B, so all 
individuals should be put in category A; 


“into A, 
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on the other hand, if Wa, is very large 
compared to Wy, then in the limit no rela- 
tive risk is incurred by classifying all in- 
dividuals into A. 

No attempt has been made here to con- 
sider the effect of uncertainty in p, or in 
f, and fp themselves, on the results. The 
method of this paper is based on the as- 
sumption that f,, fg, p, Wa, and Wg, are 
known. A hint of some of the problems 
of an exact treatment when the relevant 
data are themselves estimated statistically 
may be found in Wald. 

It should be noted how simple is the 
construction of discriminators when the 
distributions for categories A and B are 
discrete. Suppose there are k cells into 
which all individuals fall, with probabili- 
ties pa(i) and pp(i) associated with the 
o Cen, eo ky. cg Re ee a, SS 
(i)/pa(Gi),i = 1, . . ., k is the diserimi- 
nator. The level A, may be calculated 
from the auxiliary considerations dis- 
cussed above, and any individual whose A 
exceeds A, is classified into B, otherwise 
For any A, the probabilities 
P(A;B,.R) and P(B;A,R) can be calcu- 
lated easily, and the performance of the 
discriminating rule can be predicted. 


SEVERAL CATEGORIES 


The case of two categories covers a 
large body of problems, but in many diag- 
nostic problems there may be a choice 
among several categories. To simplify 
notation we will suppose there are three 
categories, A, B, and C. Let there be 
given distributions f,(x,yv,z, JY; te 
(22,9;,2, .), and fe(x,y,z,.. .). Let 
P(A;A,R) be the probability of classify- 
ing an individual of population A into A, 
let P(B,A,R) be the probability of clas- 
sifying an individual of population A into 
B, etc. There are a total of nine such 
probabilities all told, with the three con- 
ditions : 


P(A;A,R) + P(B;A,R) + P(C;A,R) 
P(A;B,R) + P(B;B,R) + P(C;B,R) 
P(A;C,R) + P(B;C,R) + P(C;C,R) 
Now let W a4( B) be the cost of misclas- 
sifying an “A” as a “B,”, let Wa(C) be 
the cost of misclassifying an “A” as a 
“C.” ete., so that there are six such mis- 





DISCUSSION 61 


classification costs all told. Just as in the 
previous section we may now define the 
three expected cdsts: 
W,(B) P(B;A,R) + 
W,(A) PC(A;B,R) + 


W,(C) P(C;A,.R), 
W,,(C) P(C;B.R), 
and 

W,(A) P(A;C,R) + 


W,.(B) P(B;C,R). 


If, moreover, the members of A, B, C 
are present in the unselected population 
in relative amounts pa, Pr, Pe, we have, 
for any discriminating rule R, the total 
expected cost : 
W(R) = py W,(B) P(B;A,R) + 

p, W,(C) P(C;A,R) 
+ py Wy(A) PCA;B,.R) + 

Dy Wy(C) P(C;B,R) 
+ pe We(A) P(A:GR) + 


Pe W,-(B) PCB;C,R) 


} : ! 
It can be shown that there is a simple 


rule for which W(R) will be a minimum. 
Set 

Py WylA) fp + Pe Wel A) fen 

1, Wa (B) fy + Pe We(B) fo, 


p, W,(C) ft, Pah W,(C) fp. 


Then a rule with minimum expected cost 
W(R) is one which classifies an indi- 
vidual into A if uy is the smallest of ug, 
Up, Uc, into B if up, is the smallest, and 
into C if ue is the smallest. W(R) is 
unaffected by the choice among quantities 
tied for the smallest value. 

It can be shown (see Wald?) that dis- 
criminating rules of this form, construct- 
ed with arbitrary probabilities py. px, pe, 
and arbitrary costs Wy( A), We(A), ete., 
have an optimum property similar to the 
optimum property of the rule A> A,, in 
that no damage is done by confining one’s 
attention to rules of this form, even if the 
quantities needed are not known. 
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DISCUSSION* 


JOHN W. TUKEY 


Princeton University 


INTRODUCTION 


These papers are very heartening to the 
statistician interested in method and in the 
progress of psychology. The various con- 
tributors have faced real problems with 


sound approaches. Some of them have 
been very successful and others have found 
difficulties, but all of them deserve much 
continuing help from statisticians in 
adapting known methods to new problems 
and developing new methods. It is pleas- 
ant to try to do a little of this in discussion, 

* Prepared in connection with research spon- 
sored by Office of Naval Research. 


THe PLACE OF THE CORRELATION 
COEFFICIENT 


Baldwin's paper on case histories merits 
all the pleasant remarks just made. A real 
problem is being analyzed with careful 
thought—without blind acceptance — of 
techniques from other fields, with the pos- 
sible exception of the use of correlation 
coefficients. The correlation coefficient 
has two real excuses for preference over 
other measures of relation or association. 
First, it is independent of changes in scale 
of the two variables concerned, so that it 
can properly be used when the size of the 








pt A OT MING et a 


62 JOHN W. TUKEY 


units are devoid of meaning. Second, it 
is symmetrical in the two variables, so 
that it can properly be used when the 
causal relation of the variables is unclear 
or nonexistent. For either excuse to func- 
tion, it is usually necessary for there to be 
a well defined population in which we are 
interested. Other measures have other 
advantages, and it is generally true that 
the use of the correlation coefficient is only 
wise when one or both of these excuses 
works strongly in its favor. 

The possibility of going on to a factor 
analysis is mentioned toward the end of 
the paper. I take this to mean the analy- 
sis of the development of several indi- 
viduals, selected to be relatively similar, 
because the measure-to-measure depend- 
ence, already pointed out by Baldwin, pre- 
vents. the accumulation of large amounts 
of independent data on one individual. 
Given several individuals, it seems almost 
axiomatic that they will not go through 
the same situations, even if external fac- 
tors are held constant, because of indi- 
vidual differences in initial state and re- 
sponsiveness. Thus no single well defined 
population exists to justify the use of cor- 
relation coefficients. Unfortunately, the 
appropriate replacement for the correla- 
tion coefficient does not seem to be at hand. 
Progress in this field needs the formula- 
tion of more specific and complex models 
based on psychological perceptions and 
intuitions. Statisticians can help in this 
formulation and particularly in the devel- 
opment of appropriate methods of analy- 
sis, such as a replacement for the correla- 
tion coefficient, but the basic responsibil- 
itv must be the psychologist’s. The sort 
of analysis put forward in Baldwin's paper 
is a useful stepping-stone. 


PREDICTION OR DESCRIPTION 


Kubis discussed the sort of job that op- 
timistic statisticians have felt could be 


done. 
done. 


It is nice to know that it has been 
Its example stresses again that the 
problem of specific prediction is far easier 
than that of simultaneous description in 
many dimensions. 


MEASURING THE AGREEMENT OF JUDGES 


Kogan and Hunt have brought out a 
very real and interesting problem. For my 
part, | should pass rapidly over Method T, 


for the application of a t-test to such 
highly correlated data is very misleading. 
(For quantitative examples see Walsh, 
Annals of Math. Statistics, 1947, 18, 88- 
96, especially p. 91.) This is not because 
of the matter of degrees of freedom, for 
it is well known (and pointed out again 
in Kogan and Hunt’s Tables 2 and 3) 
that the degrees of freedom do not affect 
the significance of t very much. The real 
difficulty lies in the division of the esti- 
mated standard deviation of the popula- 


tion by \/n in order to obtain an estimate 
of the standard deviation of the mean. 
This is legitimate only when the n ob- 
servations are independent, as they surely 
are not here. 

Coming next to Kogan and Hunt’s 
Method II, whose basic feature is work- 
ing with the standard deviations of the 9 
scores for each case, we should emphasize 
the purpose of any analysis of this kind— 
to provide a lumped measure of both sys- 
tematic differences between judges and 
their random or apparently random fluc- 
tuations. The method is sound, although 
modifications can be made. 

I myself should analyze not the standard 
deviations themselves but their logarithms 
(or, what is equivalent, the logarithms of 
the variances ), and, since either the stand- 
ard deviations or the logarithms of the 
standard deviations are strongly paired, 
should, in any case, analyze paired differ- 
ences with a t-test, which is of course 
equivalent to an analysis of variance with 
two'rows. Why I would use logarithms 
is somewhat hard to explain, and would 
take too long for the present situation, so 
it must suffice to say that (1) there are 
philosophical reasons and (2) experience 
in the analysis of variability shows that, 
at least in some cases, it works! 

Then we have the analysis of variance 
method—Method III. This has very 
great possibilities, since it allows us 
to separate, for the first time in these 
methods, the systematic differences be- 
tween judges from random and apparently 
random differences. This is practically 
iunportant, since methods for reducing 
these two kinds of difference will usually 
be different. There are various possibil- 
ities for modification of this method which 
may be worthwhile. The simplest of these 
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provides a way to help us out of the di- 
lemma about the number of degrees of 
freedom really available. (Kogan and 
Hunt are to be highly commended for 
their careful attitude toward the large 
number 304. C. P. Winsor has put the 
need for caution in the form: “No one 
ever had more than 100 (or perhaps 150) 
degrees of freedom in one place!” The 
reason for this is simple. Small amounts 
of nonrandomness and small deviations 
from normality give effects comparable 
with sampling fluctuations when so many 
degrees of freedom are involved). Let us 
take these 38 cases and divide them at 
random, perhaps with the aid of random 
numbers, perhaps in terms of order of 
presentation, into four samples of 9, 10, 
9 and 10 respectively. We can then, by 
Method II], analyze separately the 4 com- 
binations of the same 9 judges with each 
of the + sets of 9 or 10 cases. Thus we 
will produce four error mean squares be- 
fore training, paired with 4 error mean 
squares after training. These 4 pairs of 
error mean squares are independent, once 
we regard the nine judges as fixed—so 


that we can apply a t-test on 3 degrees of 
freedom on the differences of pairs of 
mean squares or, what is perhaps better, 
on the differences of pairs of logarithms 


of mean squares. We have thrown away 
a little information, but we have brought 
ourselves to a situation where we have a 
small and reliable number of degrees of 
freedom.» This type of modification is 
feasible in many problems and is some- 
thing every analyst should bear in mind. 
This modification provides a clear and 
valid test of whether for these 9 judges 
training has reduced differences other 
than those which can be accounted for by 
an additive constant for each judge. 

Now we know anyway that “system- 
atic’ differences between judges are not 
restricted to additive constants. We have 
all known pairs of teachers who give about 
the same average grade, one of whom 
will give many more A’s and E’s than 
the other. Anchoring the scale will doubt- 
less reduce such effects, but it could not 
remove them entirely. If we wish to 
separate systematic and random differ- 
ences, and probably this was the reason 
for going to the analysis of variance, then 
it seems reasonable to wish to shift these 


differences in “slope” to the systematic 
category. How shall we do this? There 
are various alternatives, no one of which 
has been thoroughly studied. 

Only one is simple enough to discuss 
here. It is quite feasible with data such 
as Kogan and Hunt’s where the error 
sum of squares is relatively minor. This 
is to take the mean scores obtained for all 
judges as sufficiently correct and then fit 
a regression line for each judge of his 
scores on the average scores assigned. 
This will remove an additional 7 degrees 
of freedom from the error line. This de- 
vice can, of course, be combined with the 
first possibility of breaking up into blocks. 
(It might be wise to take out the “first 
principal bilinear component,” instead, 
but no one knows. ) 

Now Kogan and Hunt have really con- 
sidered two forms of Method III, the con- 
sideration of mean squares (which we 
may call Method Illa) and the compari- 
son of percentages of sums of squares 
(which we may call Method IIIb). This 
latter method provides a means for taking 
account of such things as a general change 
of “slope” for the whole panel of judges. 
Such a change, of about 5%, seems to 
have occurred during the training in Ko- 
gan and Hunt's example. All the modi- 
fications and devices discussed above can 
be applied to both Method Illa and Meth- 
od IITb. 


FINENESS OF GRADUATION 
ROUNDING Errors 


AND 


There is a very important point in the 
magnitude of the remainder mean square. 
Kogan and Hunt refer to ‘a rating which 
may vary from -—2, indicating deteriora- 
tion, to +4, signifying marked improve- 
ment.” It seems reasonable to suppose 
that the actual ratings given were —-2, 

1,0, +1, +2, +3, or +4. What about 
the effect of the size of step? Havt enough 
steps been used ? 

The problem of choosing the right num- 
her of steps in a seale is partly statistical 
and partly psychological. Unfortunately, 
the statistical facts about the situation are 
far from clear to most people, who have 
been taught about “significant figures” in 
an unrealistic way, and who have never 
taken the time to analyze the situation 
themselves. Admittedly there is a psy- 
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chological bar to ultrafine scales. If you 
ask the average man to read a yardstick to 
a thousandth of an inch or to estimate 
goodfellowship on a scale of 100 steps, he 
won't cooperate. But many times scale- 
makers and measurers stop far short of 
the limit set by the danger of non-coopera- 
tion. They do this, | fear, because of 
ignorance of the statistical situation and 
past exposure to inadequate discussions 
of “significant figures.” 

Qualitatively, the basic fact about fine- 
ness of scales is this: they should be fine 
enough so that one cannot exactly check 
the reading or judgment too often. As a 
rough rule, an exact agreement of inde- 
pendent duplicates in 10% or less of all 
cases indicates that further refinement of 
the scale will not help appreciably from the 
statistical point of view. If more than 10% 
are exact checks, either the scale is too 
coarse or the duplicates are not independ- 
ent. Now independent duplicates are hard 
to obtain in many situations; if the same 
observer is used he remembers his pre- 
vious reading, if two observers are used 
they have a systematic difference. Allow- 
ance for observer differences brings us to 
the analysis of variance. So usually a 
mean square for remainder (discrepance, 
error, balance, or what you will) ex- 
pressed in scale units best compares the 
variability of measurement or judgment 


Suggested 
name 


Excellent To nearest integer 
C0 vd 


Capricious 


Perverse 


Expressed in scale units. 


with the fineness or coarseness of the scale. 
Just how, we shall see! 

If the judgments which are to be scaled 
have a reasonably continuous distribution 
over a number of steps of the scale (if they 
all fall on one or two steps it is surely time 
for a better scale) then it is usually 
sufficient to treat the effect of converting 


Type of adjustment 


To an adjacent integer 


To the further of the 
two adjacent integers 
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continuously variable judgments into steps 
ona scale as a source of rancom variation, 
independent of the other effects. This 
is precisely analogous to treating the 
“rounding errors” of a digital computing 
machine as if they were random noise. In 
either case we have a very useful first ap- 
proximation, 

Any new independent source of error 
affecting all measurements or judgments 
adds a constant to the expectation of all 
the mean squares in the analysis. The 
amount of this increment will depend on 
just how the rounding is done. Four 
plausible alternatives are: 

(1) to the nearest step (excellent round- 
ing ), 
to one of the two nearest steps, with 
probability complementary to the 
distance (good rounding), 
to one of the two nearest steps with 
equal probability (capricious round- 
ing), 
to the further of the two nearest 
steps (perverse rounding ). 


(i1) 
(111) 
(iv ) 


An example of (11) may be helpful. 3.27 
would be rounded to 4 in 27 cases in 100 
and to 3 in the other 73. (This is the only 
case that is individually unbiased!) A 
little computation lets us express the in- 
crement to the mean squares in these four 
cases as simple fractions, namely 


Average contribution 
to each mean square* 


= 0.08" 


1 
12 


= 0.16" 


To an adjacent integer, 
chosen at random 


A contribution of any amount between 
0.08 and 0.33 from the necessity of choos- 
ing an integer alone would seem to indi- 
cate that the judge was finding it relatively 
easy to convert his impression into a rela- 
tively definite step of the scale. When we 
consider that Kogan and Hunt's re- 
mainder mean squares of 0.47 before train- 
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ing and 0.32 aiter training are the sum of 
an effect of this sort and of the actual 
variations in judgment of the judges, we 
see that “rounding errors” must be sup- 
plying a substantial part of the variability, 
and that we could measure the actual 
variability of the judges more precisely if 
the effect of “rounding” were to be re- 
duced. This, I understand, happens in- 
frequently in rating situations, but Kogan 
and Hunt seem to have been unusually 
successful in stabilizing their scale. Ii the 
scale were divided into twice as many 
steps, each step being half the size of 
the present scale, the effect of rounding 
(measured in old units!) would be di- 
vided by 4, and would not blur the meas- 
urement of the variability of judges so 
badly. 


JupGE-To-]UpGE VARIATION AND 
NUMBER OF ANCHORS 


It is also interesting to inquire how 
much of the judge-to-judge variability can 
be accounted for by allowing for their esti- 
mated variability as applied to the cases 
used to anchor the scale. If the scale is 
anchored by k cases, and if the mean 
square for a single judgment before train- 
ing is 0.47, the variance of a single judge’s 
0.47 

k 

This same quantity can be estimated from 
the mean squares for judges and remain- 

1.74 — 0.47 
38 

The two estimates coincide for k a little 
less than 14. (Allowance for rounding 
errors would reduce this value.) It would 
be interesting to know how many cases 
were used to anchor the scale. (I have 
since learned that 3 cases were used as 
anchors, so that, if the error variance were 
the sum of an intrinsic variance and a 
rounding variance, while the systematic 
Judge-to-Judge shift were due solely to 
the intrinsic variance at the anchors, we 
find estimates of 0.10 for the intrinsic 
variance and 0.37 (before) and 0.22 
(after) for the rounding variance. These 
do not seem entirely unreasonable. ) 

The complication and delicacy of con- 
sideration which this problem has shown 
is, perhaps unfortunately from the clini- 
cian’s point of view, characteristic of a 
wide class of problems with whose solu- 


mean would be 


der as = 0.034 


tion he is concerned. The saving grace is, 
perhaps, the great extent to which we 
have been able to answer (or attack) many 
interesting questions from the numbers in 
one analysis of variance, 
COUANTIFICATION 


Rabin’s paper discusses the history of 
quantification of the Rorschach and the 
difficulties which seem to face us at pres- 
ent. I say seem, for there are approaches 
which may circumvent some of them. Why 
ask the quantitative technique to do all 
the work? Let us suppose that we can 
obtain the cooperation of ten experienced 
and skillful clinicians. Let us prepare a 
large number of dummy Rorschach pro- 
tocols, determining the scores by random 
numbers, with due allowance for promi- 
nent known correlations, and then submit 
them to the ten clinicians, asking each time 
first for placement of the case with regard 
to the personality characteristics we wish 
to study, and second for a list of questions 
whose answers would improve diagnosis. 
sy studying these results, it should be pos- 
sible to construct composite scores which 
would be as predictive as a good clinician 
who has not seen the subject. 

Some of my pessimistic friends feel that 
the clinicians would not agree—that this 
process would fail. If so, it still seems 
inappropriate to ask statisticians to take 
over the whole analysis. They could un- 
doubtedly do it alone eventually, but who 
would want to pay for the vast experi- 
ments? If the psychologists were to con- 
tribute their shares to the building of 
models, algebraic or worse, of how these 
factors interact with one another, the joint 
team of statisticians and psychologists 
would get there quicker. In all these fields 
we are going to need definite models, ap- 
propriate to the situation, and teamwork is 
the quickest and best way to get them. 


TESTING CORRELATED MEASUREMENTS 


Cronbach's discussion is still more con- 
cerned with method. Accepting the frame- 
work within which he puts the question, 
it is thus easier to make comments which 
may be of immediate help. One great prob- 
lems which concerns him is the problem 
of correlated measurements. If we knew 
the distributions in the populations, we 
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could perfectly well choose linear com- 
binations of test scores which are uncor- 
related—in terms of these new variables 
our significance tests would be independ- 
ent, all our processes would be simple. 

If we have a situation which is a multi- 
variate analog of any standard analysis of 
variance situation, then we can do this 
from the experimental data. One such case 
arises when we have multiple scores on 
several groups, and it is reasonable to as- 
sume that the variances and covariances 
within the groups are the same from group 
to group (but not from variable to vari- 
able). In this case we may choose linear 
combinations which are uncorrelated with- 
in groups and then make substantially in- 
dependent tests between groups in the 
different combinations. We can clear up 
many of our difficulties by transforming 
to uncorrelated combinations ! 


IXPLORATION AND CONFIRMATION 


Similarly we may have difficulty with 
that sort of lack of planarity which can be 
removed by transformations, by express- 
ing the individual scores in other terms. 
Here there is no automatic process to be 
recommended, and our attempts must be 
guided by psychological insight. There 
will be those who say that by using non- 
automatic processes we are distorting our 
significance levels and our other probabil 
ity estimates by an unknown amount. This 
is true, but it neglects an important dis- 
tinction. In every field of science, and par- 
ticularly in fields where data and analysis 
is complex, there are two different phases 
of quantitative analysis-—-exploration and 
confirmation—and almost always, when 
dealing with complex problems, these have 
to be carried out on different samples of 
data. The problem one example of which 
faces Cronbach, seems to be chiefly one of 
exploration, of finding what seem to be 
reasonable ways to summarize the data. 
When these ways are synthesized, new 
data will be needed to confirm them. But 


the first emphasis must be on perspicuous 
methods of exploration, which should be 
graphical whenever possible. 


ESTIMATION WITH CORRELATED ERRORS 


Cronbach has pointed out the problem 
of estimation from unequally-fallible cor- 
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related scores—the essentials here are 
knowledge of the correlations, both of the 
true scores and of the errors, and knowl- 
edge of what properties of estimate are re- 
quired! In how many cases do we know 
all three? Yet with incomplete informa- 
tion, the problem is meaningless! 


LOCALIZATION By PEELING 


In addition to the localization process 
which Cronbach has discussed, there is 
now available another one of fairly wide 
application, The basic idea is that of 
generalized tolerance regions and _statis- 
tically equivalent blocks, and is a general- 
ization of order statistic procedures. Let 
us suppose that a competent judge is given 
the k scores of N cases which fall into 2 
types and is not told which sets of scores 
belong to individuals of which types. Sup- 
pose further that he is told, in general 
terms, the region in which one type is be- 
lieved to be concentrated. Then he can 
peel the cases off, one by one, starting far 
from that region and working toward it. 
The result is an arrangement of the cases 
in a sequence. If the hypothesis of con- 
centration is false, the two types will be 
scattered at random through the sequence ; 
if it is true one type will be concentrated 
near one end. This is easy to test. The 
big disadvantage of this procedure is that 
it depends on ignorance of the cases by the 
judge. This is parallel to the requirement 
of selecting the region before examining 
the data in the conventional process. The 
new process should be useful, however, if 
some 10 to 25 general regions were chosen, 
and a competent judge (or judges) un- 
familiar with the data were asked to make 
all 10-25 orderings before any were scored 
as to arrangement of types. 


MANIFOLD TyPEs 


Cronbach's emphasis on the need of the 
psychologist to recognize and separate 
“types of good personality” is surely both 
sound, important, and ultimately helpful 
for statistical analysis. 

More SENsITIVE TESTS FOR 
“BUNCHING” 

Cronbach asks for a test for a systematic 
deviation of the distributions of two popu 
lations on a single variable, and a general- 
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ization to several variables. Such tests 
can be provided in the single-variable 
case, but just because psychologist and 
statistician are agreed here on the meaning 
of systematic. The advantage of such tests 
over chi-square is simple—by neglecting 
unsystematic deviations the test can con- 
centrate its allowed percentage (5%, 1%, 
or what have you) of false positives on 
samples which seem to have systematic 
deviations. Thus a smaller deviation, if 
systematic, will be recognized as signifi- 
cant than in the chi-square test, and rightly 
so. 


PREDICTION IN Two or More 
DIMENSIONS 


The problem of comparing multidimen- 
sional predictions with multidimensional 
criteria can be attacked by that multidi- 
mensional generalization of the analysis of 
variance that I have called “dyadic analy- 
sis of variance” and which was discussed 
in some detail at the meeting of the Insti- 
tute of Mathematical Statistics in Prince- 
ton in November 1946 (and will, I hope, 
appear shortly in Human Biology). Only 
experience can tell whether this tool will 
meet the need, but it provides a way of 
dealing with the problem which allows for 
such matters as the difference in shape of 
the distributions of error and of persons, 
which was pointed out by Cronbach. 

Tui 


Locic oF INVERSE ANALYSIS 


Stephenson’s paper is interesting, 
though the thought is occasionally hard 
to follow for a statistician with a meagre 
background in this area. For example, the 
meaning of “I have reserved the right to 


distinguish between universes which | 
sample (and which provide me with error 
estimates), and variables (such as_per- 
sonalities) that I wish to manipulate de- 
ductively, and to which sampling condi- 
tions in no way apply” is hard to grasp. I 
should like to set out in other language 
what it seems to me that Stephenson is 
saying, so that I may be corrected as neces- 
sary. | suggest that what is meant is 
roughly the following : 

There are a large number of indi- 
vidual, definite statements which can 
be recognized by experts as relevant to 
some aspect of personality in some class 


of situation. Lists of a substantial num- 
ber of these can be prepared, and can be 
regarded as psychologically meaningful, 
samples can be drawn from the lists. By 
taking a few people, and relating them 
to the statements in a sample list we can 
“correlate” the persons on the state- 
ments of the sample list and make infer- 
ences to the correlation which would 
have been obtained from the whole list. 

It is psychologically reasonable: (1) 
to analyze the sample-list correlations 
in terms of empirically found factors, 
with the understanding that these fac- 
tors are relative to the particular small 
group of subjects used (and need not 
apply to larger groups or to our culture 
in general), and to the prepared list of 
statements ; (2) to use this factor analy- 
sis to confirm or reject psychological 
hypotheses in a way which has so far 
not been made completely clear or 
definite. 


The paraphrase just proposed may seem 
unduly harsh to some, but it is only in 
terms like this that I am able to conclude 
“they appear to mirror something of 
themselves in the ideal of their own type” 
as applying in some generality rather than 
only to graduate students of psychology at 
the University of Chicago with certain in- 
terests. The need for bearing in mind that 
such steps are grounded on psychology 
and not on statistics seems the most im- 
portant comment on this paper. 


Tue NATURE OF SCALES 


Gardner’s paper strikes at a central 
problem of psychological measurement— 
the synthesis of rational scales. The dis- 
cussion of scales of different types, notably 
by Stevens, has been useful and illuminat- 
ing. The purity and uniqueness of the 
scales of physics has, it seems to me, been 
exaggerated. The physicist often gives the 
impression that the scales of mass, length, 
time and temperature are obvious, and so 
unique that it would be hard to bypass 
them. It may be worthwhile to point out 
the extent to which this does not apply to 
length. 

Suppose we take the basic operation 
with two lengths to be the placing of them 
end to end at right angles. This is as well 
defined geometrically as placing them end 
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to end in the same direction. But if we do 
this, the natural scale, in whose terms the 
basic operation is addition, is the square 
of the ordinary length. (Recall the theo- 
rem of Pythagoras!) Thus even for length 
there is no unique natural scale, until we 
specify the purpose for which the scale is 
to be used. 


GARDNER'S SCALING SITUATION 

With this point clearly in mind, we can 
examine Gardner's technique critically 
while continuing to appreciate its substan- 
tial value. Gardner has demonstrated the 
practicality of scale synthesis from a col- 
lection of populations, without the as- 
sumption that the distribution is the same 
in each population. This is a most impor- 
tant step ahead, and is far more important 
than the particular methods of synthesis 
he has used. 

The fitting procedure sketched by Gard- 
ner is almost certainly far from efficient, it 
is now a responsibility of the mathematical 
statisticians to find a reasonably efficient 
one. 

Now when we come to examine the data 
for which Gardner synthesized a scale, 
there are certain points which come to 
mind. The nature of students in grade 
five, in grade six, and in grade seven is 
not that of three discrete populations. The 
exact chronological ages at entrance to 
first grade must range through 12 months, 
and there must be further spread due to 
differences in rate of development, par- 
tially compensated by the practices of 
skipping and repeating grades. Unless we 
adopt these, to the writer foolishly naive, 
conception that “grades passed through” 
really measures the status of a student’s 
educational development, and unless we 
get data on students from very similar 
school systems, at the same time of the 
school year, we must at least regard the 
fifth-grade, sixth-grade, and seventh- 
grade populations as made up of many 
subpopulations, which probably overlap 
but can perhaps be idealized as sections 
from a continuum of subpopulations. 

In the two cases sealed by Gardner 
which I have seen and in terms of his 
scales, the standard deviation of the dif- 
ferent grades shifts systematically. It is 
natural to suppose that the same is true of 
the supposed subpopulations. If this were 


so, and if the subpopulations were sym- 
metric, then the grade populations would 
be skew. In every case which I have had 
a chance to examine, the average skewness 
obtained by Gardner’s fitting has the sign 
which one would expect from this model. 
(The significance of the detailed fluctu- 
ations is hard to estimate. ) 

It behooves some of us to develop a sys- 
tem of synthesizing scales based on such a 
model, to apply it to data, and to investi- 
gate whether its conclusions are less, or 
more useful than those obtained by Gard- 
ner’s technique. In any event, it has been 
Gardner’s privilege to open a new chapter 
in the synthesis of scales. 


INVERTING A TECHNIQUE 


Horn has nicely inverted a known type 
of analysis from inter-individual to intra- 
individual. More of this will be done in 
the future. If I had contributed a paper, 
rather than discussion to this program, the 
title would have been “.\re the clinician’s 
problems different?” Horn’s paper indi- 
cates why the answer is “No!” from the 
statistician’s viewpoint—the analysis is 
the same. 

CODING 


Guetzkow has opened up interesting 
problems which deserve careful study and 
further development. Unfortunately the 
details of his analysis seem to need modi- 
fication. 

The basis of his analysis of coding 
errors is that any unit miscoded is equally 
likely to go into any one of the k-1 other 
categories. This amounts to assuming 
that a unit is either correctly classified, 
with 100% certainty, or it is classified at 
random. It seems to me that if a par- 
ticular item is to be misclassified, there are 
usually one or’at most two wrong cate- 
gories into which it will be placed. If this 
is true, then the effective value of k in 
Guetzkow’s model will usually lie between 
2 and 3, no matter how many categories 
are used, 


This suggestion is based on a supposi- 
tion, but it would be easy to test it by 


analyzing three or more coders. With 
three coders, where the three possibilities 
of classification—three alike, two alike, 
and all different—can arise, a large enough 
body of data might allow us to determine 
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both p and the effective value of k experi- 
mentally. (While the assumption that the 
probability of correct coding is the same 
for each unit is no doubt fallacious, we 
may reasonably hope that its use is not 
leading us far astray. ) 

A possible substitute for Guetzkow’s 
figures 2 and 3 is obtained by using the 
angular transformation and the reciprocal 
of the sample size. Here the critical curves 
are almost straight. 

UNITIZING 

In his discussion of the reliability of 
the unitizing process, there seems to be 
another point of optimism vs. pessimism. 
Guetzkow assumes that the average num- 
ber of units found by each coder is the 
same. This assumption also seems over- 
optimistic. People differ in everything 
else, why not here? 

lf we admit the possibility of such dif- 
ferences, the appropriate analysis would 
seem to run as follows: Take several 
masses of material of notably different 
lengths, let 
D= (units by coder 1) — (units by coder 2), 
S= (units by coder 1) + (units by coder 2), 
then 


cov(D,S) — D 
var(S) —§ : 


d=2 
and 
__ var(D) cov(D,S) —D 


— ~ d, 
Ss 28 


where the means, variances, and covari- 
ances are taken over the values corre- 
sponding to the several masses of ma- 
terial, will estimate the relative difference 
in unitizing rate of the two coders, and 
the average percentage of boundaries in- 
serted by one coder and not by both, re- 
spectively. 

It always seems to be necessary in any 
such analytical situation, * tain several 
comparisons and analyze th..n jointly. In 
the coding case, we examined agreement 
and disagreement of the coders on each 
unit. We were not content to compare 
merely the total number of units put in 
each category. So, too, in unitizing, we 
should analyze several masses of data for 
difference in the number of units found. 


DISCRIMINATION 

It may be worthwhile to indicate graph- 
ically what Brown has said, and to point 
out one extension of his analysis. We all 
know how to discriminate population A 
from population B when we have one 
measure, and the two populations have 
single-humped distributions with a rea- 
sonable overlap. We fix a cutting point, 
by procedures that we shall return to be- 
low, and “accept” all above the cutting 
point. To “accept” may be to admit to a 
training program or to an institution for 
treatment, but it is in terms of acceptance 
and rejection that we most commonly 
think of discrimination. But suppose the 
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situation looks like Figure 1, where one 
population, say B, is double humped, and 
overlaps both sides of A. Brown has told 
us that we may appropriately choose such 
pairs of points as P, and ©, (where the 
two probability densities are in ratio 1:1), 
P, and QO, (ratio 1:2) or P,., and Q,., 
(ratio 1:'2), and then accept all those 
cases falling between these points. All this 
this assumes that we know the populations 
well enough to neglect sampling effects. 


3rown is careful to say that he is not go- 
ing to consider the case where sampling is 
included. 

Figure 2 shows the contours of equal 
probability densities for simultaneous 
scores on two tests for two hypothetical 
populations. The solid curves show how 
population A is concentrated along a 
NNE-SSW line-in the NW corner of the 
figure. The dashed curves show the pe- 
culiar concentration of population B along 
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a roughly circular are in the SE corner. 
The heavy dots pass through the points 
where the probability densities of the two 
populations are the same. Using this 
curve as a cutting curve corresponds to 
choosing points P, and Q, in Figure 2. 
There are an infinity of such curves, one 
for each ratio of population densities. 
srown has told us that these curves are 
optimum, that when the two populations 
have normally distributed scores with dif- 
ferent means but the same variances and 
covariances, these curves reduce to the 
cutting lines determined by the familiar 
linear discriminant function. 

If we have more than two scores, we 
merely obtain cutting surfaces, or even 
hypersurfaces, each passing through all 
profiles for which the ratios of probability 
densities are the same. 


In what sense are these curves opti- 
mum? “By their fruits ye shall know 
them’’—so let us pass to a figure which 
shows the “fruits” of any discriminating 
scheme, thought of as a selection device. 
It is in selection that psychologists have 
come oftenest to problems of discrimina- 
tion, and so this type of plot may help to 
make Brown's results intuitive. All we 
need do is plot the fraction of population 
B accepted against the fraction of popu- 
lation A accepted. Any discrimination 
scheme can be represented in terms of a 
score and a cutting point—many are most 
naturally in this form. As we change the 
cutting point, a score (compounded out of 
the various results we may have) or dis- 
criminator is represented by a point trac- 
ing out a curve, like APB in Figure 3. 
This curve shows just what we can do 
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with this discriminator by setting one cut- 
ting point. .\re we satisfied with the dis- 
crimination at point P? Not if we can 
have either point Q or point R in ex- 
change. By moving from P to Q we ac- 
cept fewer from population B and lose no 
more from population A. By moving from 
I? to R we accept more \’s and no more 
B's. Thus if we can move to the right or 
down on this figure, we will choose to do 
so. The curve AEFB is the limit to our 
motion in these directions. Any point be- 
tween Ic and F is preferable to P, but 
which is better than the others? The an- 
swer to this depends on other factors. 

Brown has showed us that we can get 
any point on AEF A by calculating A and 
using the correct cutting point (and if we 
want points on BH.\ we can take the 
other halves of the cuts!). Here 


TUKEY 


__ chance that an A will have a given profile 

~ chance that a B will have the same profile 
To locate the cutting point, consider a 

given profile, z, let 

f = chance an A will have profile x, 

f,, = chance a B will have profile z, 

p = chance an individual is an A, 

W, = cost of losing an A, 

: cost of taking a B, 
= number of cases. 


There will be, on the average, Npf, indi- 
viduals with profile + who are A’s and 
N(1-p)fx who are B's. The cost of tak- 
ing everyone with profile z is, on the aver- 
age, N(1-p)fswy. The cost of rejecting 
everyone with profile x is, on the average, 
Npf,W,. These are equal when 
N(1—p)f,W, = Npf,W, 
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That is, when 
fy, pW, 
f, G»W, 


Since the tangent to the envelope AEF A 
is the ratio f,/f, at the cut-off point, this 
can be easily interpreted graphically. This 
is essentially Brown’s discussion, re- 
phrased. 


SELECTING A GROUP OF FIXED SIZE 


Brown has discussed the case where 
W, and Wy, are the relevant costs, and 
the size of the selected group is not fixed. 
Let us look at the other extreme, where a 
fixed number n are to be taken, and where 

W = cost of taking a B instead of an A, 

W., = cost of testing another man, 

4, = fraction accepted of A’s tested, 

Gy, = fraction accepted of B’s tested, 

q = fraction accepted of all tested. 


Clearly 
q = pq, + (1—p) dy - 


If M in all are tested, then n = Mgq and 
the total cost is 


MW, + M(1—p) gy, W + constant 


(W,,/U—p) W) + ag 
= n(l—p) W — : 


+ constant. 
q 

In this form a graphical solution is easy, 

and is shown in Figure 4. Note that 


Wy ____ cost of testing one individual 
‘i; pW loss in accepting a random indi- 
vidual instead of an A 
The optimum choice is now determined by 
this testing cost—unselected cost ratio. 


GENERAL COMMENT 


Throughout the detailed discussion I 
have suggested places where the statis- 
tician and the psychologist could work to- 
gether in the development of more detailed 
models. The most striking thing about the 
collection of papers is that nearly everyone 
seems to wish to gather data without ex- 
plicit hypotheses, and then trust to statis- 
tics to produce meaningful results. There 
is a striking absence of the classical scien- 
tific method of first hypothesis, then ex- 
periment, producing confirmation or re- 
jection of the hypothesis. This is good sta- 
tistical scientific method—indeed _ the 


standard method. (The absence of this 
method was pointed out to me by Stephen- 
son. ) 

It may be worthwhile to inquire why 
this is so. Two alternative explanations 
come to mind at once: (1) the theoretical 
considerations and speculations of clinical 
psychology are so few and weak that no 
reasonable hypotheses can be made (this 
view | should be reluctant to take!), and 
(2) the authors hold the view, often re- 
ferred to as the tabula rasa fallacy, that 
an objective scientific approach is one with 
an empty mind and no hypotheses or pre- 
conceptions. (This is certainly not the 
way that progress has been made in physi- 
cal and biological science ! ) 

Let us take a field of reasonable com- 
plexity where great progress has been 
made, partly with the help of statistics, 
and see how successful experiment has 
been divided among the three types: 


(1) unplanned experiments, with  statis- 
tics responsible for trying to find the 
meaning without clues, 
simple hypotheses, directly tested, 
with statistical principles used to 
assay the extent of confirmation, 
mathematical models, where the ex- 
periment is intended to estimate cer- 
tain quantities and all the available 
statistical power available is used to 
increase the precision of these esti- 
mates. 


These three types have followed one an- 
other in agriculture in the order given. 
The first type dates back to the days 
before the agricultural experiment station, 
when all that could be done was looking 
around from farm to farm. This type lost 
its value between 150 and 100 years ago! 
The next type came to maturity with the 
founding of Rothampsted in 1840 and 
reached its height in the decade intro- 
duced by Mercer and Hall's famous work 
and Student’s early papers. The broad, 
simple questions were adequately an- 
swered here for the first time. 

The third type was introduced into agri- 
culture by R. A. Fisher, whose simplest 
and basic designs—randomized blocks 
and the Latin square—owe their value to 
the effectiveness of the additive model that 


underlies their analysis! The detailed 
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questions of agriculture are being an- 
swerediby techniques of this sort and ex- 
periments of the third type. 

What is the lesson for clinical psychol- 
ogy? If the simple, Hroad questions are 
still open, they should be attacked by 
simple experiments based on definite hy- 
potheses. When this phase is well along, 
quantitative models can be set up, possibly 
in cooperation with statisticians, and 
efficient experiments to settle the details 
can be planned, carried out, and appro- 
priately analyzed. 

To give the analysis of experiment 
over to statistical methods of wide appli- 
cation without regard to psychological 


hypotheses is to turn psychology over to 
the statistician, who already bears a heavy 
enough burden. The statistician will not 
make a really good psychologist, but if 
the job is abandoned to him he will do 
what he can. It is the psychologist’s re- 
sponsibility to make meaningful hypothe- 
ses and to find ways in which their con- 
firmation can be approached. The statis- 
tician can then help him efficiently and 
effectively. An open mind will have a place 
for hypotheses—where they can be useful 
and from which they can be ejected with- 
out undue pain. Neither an empty mind, 
nor a rigid mind will serve the experi- 
menter adequately. 
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In the first paper Baldwin has discussed 
problems in handling the time dimension 
in the statistical analysis of data regarding 
a single individual. In discussing temporal 
trends he has distinguished between the 
two types of dependence of the data with 
respect to time series. The first of these he 
calls situational dependence. This is de- 
fined as the type of relation between two 
sets of measures obtained over a period 
of time when both are related to a third 
factor which is correlated with position in 
the time series. In this case the behavior 
observed during the second period is 
partly a function of the fact that it is the 
second observation—that the situation 
was experienced before. However, the po- 
sition of the obtained measurement in the 
distribution of the first set of observations 
has no effect on the position a similar ob- 
servation will take in the distribution ob- 
tained during the second period of meas- 
urement. The other type is called measure- 
to-measure dependence. This type of de- 
pendence is defined as that in which there 
is not only dependence due to being asso- 
ciated with time correlated measures but 
also because his behavior during a later 
period of observation is in part a function 
of the way in which he behaved during 


previous periods of observation. In this 
situation the measurement obtained the 
first time actually influences what it will 
be the second time. 

Certainly this is a Very desirable distine- 
tion which Baldwin has suggested and it 
appears that much is to be gained from 
separating the variance in terms of factors 
of this type. The methods of analysis of 
variance and covariance lend themselves 
especially well to such separation as is 
proposed. 

The paper by Kubis indicates that the 
techniques being used at Fordham for de- 
tecting deception have yielded diagnoses 
having a useful degree of accuracy. The 
diagnoses are based on the magnitude of a 
series of ratios of the physiological re- 
sponses to critical questions regarding the 
crime being investigated as compared with 
paired responses to questions regarding 
points in the life history of the subject 
which would produce an emotional re- 
sponse of anxiety, resentment, or embar- 
rassment. For the innocent the ratios 
while sometimes starting high tended to 
converge rapidly to unity. For the guilty 
individual attempting deception no such 
adaptation effects were observed. This 
technique raises the general problem of 
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standard errors or variance errors for ra- 
tios. The formula for the variance error 
ot a quotient: 


Ss 2 


? 
20,00 x, 


xy 


appears to be applicable here. This of 
course has certain limitations such as the 
requirement that both x and y be positive 
and that the standard deviation of y be 
small in comparison with y. In most cases 
the distribution of x/y is such that the 
normal distribution will not give a very 
accurate basis for testing the null hypoth- 
esis. The paper does not discuss studies 
of the reliability or consistency of the 
measures used. It would appear that split 
test procedures would be appropriate and 
informative in this situation. The point 
deserving most emphasis in commenting 
on this study is the commendable use of a 
criterion of success. The empirical valida- 
tion made possible by Kubis’ use of indi- 
viduals of relatively well established guilt 
and innocence is a procedure deserving 
much greater use in research in clinical 
psychology. It is hoped that the example 
of this study will lead others to seek such 
objective criteria to test their procedures. 

The paper of Kogan and Hunt discusses 
some problems related to testing hypoth- 
eses regarding sets of judgments made by 
k judges regarding n things. It also in- 
volves the problems of judging these n 
things ¢t times. This problem brings up 
some questions regarding number of inde- 
pendent observations and statistical infer- 
ences which arise quite frequently. It 
seems worth emphasizing that one of the 
primary issues here is not statistical at all 
but involves the precise definition of the 
inference or hypothesis to be tested. Do 
we wish to know in the example given by 
Kogan and Hunt whether it is reasonable 
to suppose that given 9 judges exactly like 
these and training precisely like that re- 
ceived by each of them that the judgments 
made before training will be indistin- 
guishably different from the judgments 
made after training on a very large number 
of cases? In other words is our inference 
to be one from the sample of 38 cases to 
an indefinitely large sample? Or are we 
interested in knowing whether it is rea- 
sonable to suppose that given these 38 


cases and this particular training course 
that a very large group of judges would 
make judgments indistinguishably differ- 
ent after training from those made by the 
group before training? If we decide that 
what we are interested in is this latter 
question our estimate will be based on the 
comparisons of the sets of judgments of 
the 9 judges before and after training. 
These judgments can be compared in a 
number of ways. One way is in regard to 
relative score and this is best done by cor- 
relating the judgments as reported in 
Table 1. Another method of comparison 
is with respect to the agreements of the 
9 judges in terms of absolute score units. 
The variability of such agreements is 
shown for these data in Table 2. In either 
procedure it is important to note that the 
sampling errors of the observed values 
must be calculated using the number of in- 
dependent observations as 9 minus the 
number of restrictions that have been im- 
posed on these observations. 

In his discussion of statistical problems 
involved in Rorschach patterning Rabin 
reports that attempts to use clusters of 
differentiating factors for the identification 
of such disorders as neuroses and organic 
involvements have not been very success- 
ful. A method recently developed by Dr. 
William Kogan to be published soon ap- 
pears promising for such problems. In 
this approach each individual's pattern of 
scores is compared with the pattern of 
every other individual. The absolute de- 
viations between scores are summed. In 
this way stable patterns within various 
categories can be easily detected. 

In his discussion of statistical methods 
for multi-score tests Cronbach uses the 
geometry of hyper-space to define the 
problems. If there are k scores an indi- 
vidual’s results on all the tests can be ex- 
pressed as a single point in k-space. Prob- 
lems with respect to scores and patterns 
then reduce to questions of the relative 
density of points in various regions in k- 
space for specified groups. Several meth- 
ods of handling multi-score data are pre- 
sented. To make the problem of dealing 
with items in k-space practical, it is pro- 
posed that the various score scales be di- 
chotomized. This is certainly a good ex- 
pedient since no systematic bias is intro- 
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duced and with large populations existing 
patterns should emerge. In discussing 
multiple regression procedures the ob- 
servation is made, “when we plunge from 
six dimensions to one we have discarded 
a tremendous amount of information.” 
This is true if curvilinear relations and 
non-normal distributions are involved. 
However, extensive study of large samples 
in research in the services during the past 
war failed completely in establishing an- 
ticipated curvilinear relationships. It 
therefore appears that perhaps we 
shouldn't sit back and complain about the 
inadequacy of present statistical tools but 
get all we can out of those we have until 
better ones are developed. Maybe we'll 
find that an insignificant rather than a 
tremendous amount of information is dis- 
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carded by using a simple linear combina- 
tion of our scores. Perhaps our common 
sense, that tells us that the patterns set 
equivalent by using linear combinations 
are clearly different, is in need of further 
facts to rationalize empirical findings. The 
methods of pattern tabulation and match- 
ing as described by Cronbach appear 
promising and deserve wider application 
to problems in the clinical field. 

In conclusion it may be said that the 
papers presented in this first session of 
the panel are very encouraging. The 
awareness and definition of problems is a 
long step toward obtaining solutions, Clin- 
ical psychology appears to be making im- 
portant progress in improving its research 
methodology. It is hope that sessions such 
as this will accelerate this movement. 
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INTRODUCTORY NOTE 


extensive research projects in group 
therapy are under way by many investi- 
gators. <A search for a single biblio- 
graphical source as a reference guide for 
the worker in group therapy proved to be 
in vain. Unfortunately, the publications 
of bibliographies on group therapy pre- 
sented, in the main, only those references 
of authors who are workers in particular 
“schools” of group therapy at sacrifice to 
excellent contributions to the literature of 
group therapy by others in the field. 
This bibliography is meant to fill a much 
needed gap for a single comprehensive 
source of reference for the student in 
group therapy. 

Only those references were included 
where articles were accessible. No at- 
tempt was made to group the references 
under various categories or main areas of 
interest, chiefly because such bibliog- 
raphies usually restrict the worker ac- 
tively engaged in one type of group ther- 
apy from becoming acquainted with the 
contributions made by related types of 
group therapy. 

This bibliography was planned to meet 
the requirements of workers in group 
therapy whose primary interests are in 
mental hygiene, guidance, group manage- 
ment, or rehabilitation. It covers rele- 
vant articles and books that appeared up 
to June, 1949. 
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INTRODUCTION 


Good work habits are an essential foun- 
dation of mental health. Although 
there are a few people in our civilization 
whose circumstances permit a life of idle- 
ness, people do not fulfill their creative 
potentials or achieve maximum mental 
health unless they are able to make their 
contribution through efficient work habits. 
The life which is unable to make such 
contributions loses much of its significance 
and is sterile to the degree to which it 
becomes stultified and parasitic. It is 
therefore important to explore the ways in 
which good work habits contribute to 
mental health and to discover how poten- 
tialities may be actuated. The validity of 
this viewpoint is partially substantiated in 
the emphasis which modern psychiatry 
has placed upon the work history as an 
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indicator of mental health and of occupa- 
tional (work) therapy as one of the 
principal methods of psychiatric treat- 


ment. The purpose of this paper is to 
outline the role of work habits in the 
etiology of personality maladjustment, 
and to present an analysis of the factors 
contributing to efficient work habits with 
suggestions concerning how they may be 
learned. 

One of the most significant develop- 
ments in the evolution of modern civiliza- 
tion has been the rise of the complex 
societies in which free citizens work co- 
operatively in more or less regimented 
industrial systems. Increasingly rare are 
the situations in which a person can work 
in geographic or social isolation at his 
own speed and left to his own resources. 
To an increasing degree, each person 
must face the problems of adjustment to 
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complicated social and economic systems 
which place great stress on productivity, 
creativeness, emotional stability and good 
interpersonal relations. In the past, prob- 
lems relating to the training of young 
people in good work habits have been left 
to chance or dealt with only haphazardly. 
While the relation of good work habits to 
adjustment has been generally accepted, 
instructional techniques have not been 
sufficiently individualized so that the vo- 
cational strengths and weaknesses of each 
person are identified and appropriate 
remedial training instituted. 


ErroLocic CONSIDERATIONS 


oor work habits may be either an 

etiologic cause or a symptom of person- 
ality maladjustment, and it is therefore of 
diagnostic importance to determine etio- 
logic relationships in each case. It is rec- 
ognized that any malignant personality 
disorder will be reflected in the work his- 
tory. Where a person has had a good 
work history before becoming mentally 
ill, it may be assumed that treatment di- 
rected toward the primary mental dis- 
order will modify the basic etiologic cause 
with the result that work habits will im- 
prove as mental health is recovered. It 
is significant that good work habits may 
operate to preserve the integrity of a sick 
personality and may even permit a very 
unhealthy or psychotic personality to con- 
tinue functioning outside an institution 
even in the presence of grossly pathologic 
behavior. 

Example: H. L., male, age 38, single, mill la- 
borer. Has worked steadily at the same job 
in the mill since leaving the 8th grade. The 
foreman has always regarded him as one of 
his most dependable men operating a machine. 
About two years ago, H. L. developed mild 
schizophrenia of mixed type. Became seclu- 
sive, withdrawn and with marked mood- 
thought dissociation. Apparently hallucinat- 
ing at times, observed to mumble to himself 
incoherently. In spite of this psychotic be- 
havior, created no social problem and con- 
tinued working mechanically. Lives alone in 
a room, his landlady putting up his meals. 
Aiter work returns to his room and is never 
seen until the next morning when he goes to 
work. He rarely speaks to anyone at work. 
Fortunately, his machine is spatially isolated 
and he can carry on alone at his own rate. 
The foreman states that he works like a robot 
efficiently enough so that there has been no 
cause to discharge him 


Every clinician knows of many cases 
with marked personality disorders in 
which a stable routine of work habits 
appear to be the only thing which keeps 
the person from a complete breakdown. 
Conversely, occupational therapy fre- 
quently succeeds in reestablishing a suffi- 
ciently normal routine of living so that 
the psychotic person gradually resumes 
productive patterns of behavior. 

Of greater potential significance in the 
prevention of personality maladjustments 
is the situation in which failure to learn 
efficient work habits is the direct cause of 
morbid personality reactions. In our 
opinion, the failure of a person to make 
an adjustment in the role of a “worker” 
may be just as productive of devastating 
personality reactions as his failure to suc- 
ceed in his “social” or ‘sexual’ roles. 
Psychoanalytic preoccupation with dis- 
orders of psychosexual development has 
operated to deemphasize the etiologic role 
of situational vocational factors in per- 
sonality development to the point where 
it has been sometimes tacitly assumed 
that vocational problems will spontane- 
ously resolve themselves if emotional 
depth factors are treated. Alfred Adler 
gave more detailed attention to situational 
factors of vocational adjustment but 
failed to analyse etiologic mechanisms in 
sufficient detail. The modern vocational 
guidance movement has perhaps contrib- 
uted the greatest understanding of the 
problem, but its efforts have tended to be 
directed more to finding a job for the 
person rather than fitting the person to 
the job. 

Theoretically, the ability to work effi- 
ciently is a function of a complexity of 
factors which appear to be learned rather 
than instinctive or intuitive. In all high- 
er species of animals showing working 
behavior, it usually is necessary to provide 
intensive training in the desired patterns 
to “break” the animal into proper work 
habits. Experiences in “breaking” horses, 
training hunting or working dogs, oxen, 
etc., indicate that animals in the “wild” 
state may survive very well when left to 
their own resources but become progres- 
sively more untrainable and resistive to 
regulation with increasing age as their 
own habitual reactions become established. 
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Even in the “working” species known to 
be susceptible to training (hunting dogs), 
animals which fail to be trained early in 
life and are allowed to roam uncontrolled 
soon become “outlaws” and become re- 
fractory to all further training attempts. 
Analogous behavior is seen in human 
“wild children” and adults who have 
never learned good work habits. Persons 
who become vagabonds as children, or 
whose early circumstances made work 
unnecessary, find it very difficult to exert 
the self-control necessary to develop good 
work habits later in life. Such persons 
may develop severe neurotic reactions and 
be totally unable to remain long in work- 
ing situations. Unfortunately, the psycho- 
dynamics of such personality reactions 
are poorly understood because of lack of 
scientific data relating to work habits. 
Theoretically, the etiologic pattern 
whereby poor work habits lead to person- 
ality maladjustments may be formulated 
as follows. Under present world con- 
ditions, it is expected that each person 
will be sufficiently productive to support 
himself and his dependents. While mod- 
ern civilization will generally see that no 
one starves through inability or disin- 
clination to work, certainly the higher 
things of life can only be obtained through 
work. The ability to work efficiently thus 
becomes a prime essential for survival 
and self-enhancement. During the long 
maturation process characteristic of hu- 
man childhood, the young person is sup- 
posed to receive an intensive training in 
self-control and in learning the behavior 
patterns which contribute to productivity. 
During childhood he is usually protected 
carefully and is permitted to live on what- 
ever standard of luxury his parents are 
able to provide. At maturity, however, 
he is expected to go out on his own and 
his economic status will henceforth de- 
pend upon the productivity of his own 
efforts. The majority of people are suf- 
ficiently well trained so that they are able 
to pass this hurdle without serious mal- 
adjustment, but a large number of ‘“mar- 
ginal” workers fail to make the adjust- 
ment and become involved in a circular 
reaction involving economic and_ social 
deprivation, frustration and unhealthy 
personality reactions. With infinite va- 


riations in the individual patterns of this 
syndrome, the typical end result is a de- 
feated person with deflated ego who is 
unable to hold his own in the competition 
of life and who may terminate as a vaga- 
bond, a relief client, or a suicide. 

Little evidence is available concerning 
the relative importance of constitutional 
vs. learned factors in the etiology of this 
pattern. In our experience which in- 
cludes a representative sample of mental 
defectives, relatively few persons are con- 
stitutionally incapable of learning to 
work. Even the low grade imbeciles de- 
rive evident satisfaction from perform- 
ing simple tasks well. Personality stud- 
ies of poor workers suggest that the syn- 
drome results either from the failure to 
receive adequate training in youth or 
from personality disorders (particularly, 
emotional instability) which distract or 
prevent the necessary concentration. In 
general, it may be stated that learning to 
work productively constitutes one of the 
most difficult tasks confronting any in- 
dividual. The aptitude frequently does 
not become developed until the third or 
fourth decades or even later. Viewed 
broadly, the primary objective of educa- 
tion is to achieve good: work habits with 
the secondary objective of acquiring 
knowledge being almost automatically as- 
sured if the first is satisfied. 


AN HypotHests CONCERNING TRAINING 
FOR WorRK 


There is no reason to suppose that good 
work habits are acquired by any other 
method than intensive training accord- 
ing to the psychology of learning. Every 
learning process involves a number of 
standard operations as follows: 


1. Motivation. Positive attitudes and incen- 
tives provide the basic substratum of inner 
need which are necessary for efficient learn- 
ing. 

Analyses of steps necessary to learn. Unless 
learning is to be by trial and error, there 
must be insight into the nature of the prob- 
lem. 

Mechanics of component operations. After 
identifying necessary steps, information must 
be acquired concerning suitable methods for 
accomplishing them. Ideally, this informa- 
tion is made available easily by the teacher. 
Practice opportunities. Situations must be 
set up to provide opportunity for the practice 
which makes perfect. 
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5. Evaluate progress. Efficient learning does 
not take place in the absence of opportunity 


for checking results in order to eliminate 
error. 


The factors which will be outlined in 
this paper ‘are considered to underly 
nearly all work situations in the sense 
that they are sufficiently general as to be 
universally applicable. There are many 
more specific factors which may be re- 
quired in accomplishing the mechanics of 
specialized types of work, but these are 
more properly assigned to industrial psy- 
chology. 

It will be noted that primary emphasis 
has been placed on the relation of suitable 
attitudes and “sets” in the work process. 
The basic objective is to build up healthy 
attitudes toward work implemented by 
certain “sets” or typical ways of think- 
ing about how to solve problems most 
efficiently. Although these attitudes and 
“sets” are acquired very slowly by the 
average person, performance becomes 
habitual and automatic with overlearning. 
Once good work habits have become 
deeply engrained, they usually persist un- 
til senility and may compensate impor- 


tantly for declining mental abilities. It 
is common to see the habit of work so in- 
firmly engrained in aged people that they 
continue to work with more or less ef- 


ficiency until incapacitated. Clinical ex- 
perience suggests that these attitudes and 
“sets” are learned most easily in early 
life, and that it may be very difficult for 
the adult to unlearn inefficient habits and 
unhealthy attitudes when the retraining 
process is undertaken later in life. The 
child usually learns them quickly since he 
knows no different and accepts what he 
is told. 


ANALYSIS OF EFrriciENT Work 
Hapits 


A workshop was conducted in the 
spring of 1949 in cooperation with the 
Springfield, Vermont, School Department 
on the problem of the diagnosis and re- 
medial training of efficient work habits. 
The first project was to differentiate and 
define operationally a group of behavior 
patterns which in the opinion of experi- 
enced school administrators and teachers 
were essential for efficient work. Follow- 
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ing identification of a list of 16 desirable 
attributes, an attempt was made to psy- 
chologically analyse the specific factors or 
learning patterns etiologic to efficient per- 
formance of each of the desired qualities 
of work. A third project was to list the 
techniques whereby each operation could 
be quantified objectively through rating 
scales and other formal tests. Finally, a 
number of recommendations were formu- 
lated concerning remedial training meth- 
ods for use with persons who were identi- 
fied as being inefficient with relation to 
any of the multiple factors contributing 
to good work habits. Plans were con- 
sidered whereby an entire school system 
might become oriented along these direc- 
tions so that a continuing cooperative ef- 
fort might be made in all school depart- 
ments to give constant attention to the 
basic problem of developing good work 
habits. For the purposes of this paper, 
the results of each of the above projects 
have been combined in a single outline 
which summarizes for each factor (a) an 
operational definition, (b) psychological 
analysis of factors involved, (c) methods 
for objective measurement, and (d) sug- 
gestions for remedial techniques. 

Factor 1. Seriousness of Purpose. Defined as 
general attitudes towards work, the role 
played by work in the individual philosophy 
or way of life. Evans(3) gives an admirable 
statement of the problem in his discussion 
of the changed attitude toward work in 
Great Britain. 

Psychological Analysis: Identification of the 
specific positive and negative attitudes which 
determine the person’s orientation to work 
as an area in life. These attitudes may be 
verbalized into a formal philosophy or may 
be simply revealed in the valence of the 
person’s expressive behavior. The person- 
ality dynamics underlying these attitudes 
may be elicited easily or may require depth 
analysis to uncover affective factors. 

Measurement Methods: 

1. Systematic sampling of attitudes by di- 
rect and indirect methods. What is the 
person’s conception of the values of 
work? Why does he work? 

Subjective reports of satisfyingness of 
work. Amount of spontaneous interest. 
Rating scale of eagerness for work. 

3. Boredom and fatigue as measures of 
negative attitudes. 

Remedial Methods: 

1. General orientation and_ philosophical 
discussions of the values of work in life. 
Emphasis on work as a positive source 
of health, happiness, productivity, crea- 
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tiveness, goals and purposes in life, char- 
acter building, and last but not least, as 
a healthful way of passing time. 
Criticism of such attitudes as that 
“work is something to avoid,” drudgery, 
unhealthy, demeaning, suitable only for 
members of a lower caste, etc. 

Gently but firmly holding children in 
work situations long enough for them to 
feel some of the satisfactions of produc- 
tivity, creativeness, etc. 

Using applied psychology to build up the 
prestige of the good worker. 


Factor 2. Industry. Defined as effective ap- 
plication per work period. 

Psychological Analysis: Factors to be meas- 
ured are (1) per cent of time spent work- 
ing, (2) intensity of effort (energy expen- 
diture), and (3) speed, or time per unit of 
work. 

Measurement Methods: 

1. Per cent of allowed time spent at work. 

2. Per cent of attendance (days absent from 
work). 

3. Speed of work at tasks within ability. 

4. Effort (total mobilization of resources). 

5. Overall measure of output level ‘after 
Burtt(2)). 


ee Difficulty x Amount X Quality 


Time 

Remedial Methods: 

1. Emphasize relation between absenteeism 
or “slow-downs” and loss of productiv- 
ity. 

2. Analyse attendance record and per cent 
of time spent at work to show actual 
amount of time lost. 

3. Efficiency engineering techniques taught 
to individual worker so that he may co- 
operatively improve work routines. 

4. Teaching the att:tude of continual self- 
criticism so the person actually works 
to discover and remedy his own inade- 
quacies. 

Factor 3. Initiative. Defined as effective self- 
starting; the “mental set” of getting started 
at a task without lost time and effort. 

Psychological Analysis: 

1. The set of “willingness,” starting activ- 
ity without prompting from without. 
The set of “getting started immediately.” 
Inventiveness and resourcefulness in ac- 
complishing difficult tasks: using imagi- 
nation and initiative in solving obstacles. 

4. Explores all alternatives. 

Vethods of Measurement: 

1. Objectively measure individual habits 
relating to (a) failure to start (procras- 
tination) and (b) time wasted getting 
started. 

2. Amount of pressure or prompting to get 

started. 

3. Amount of 
gainful activity, 
reading, etc. 

4. Analysis of daily activities with 
for initiative and other factors. 


spontaneous (unprompted ) 


particularly working, 


ratings 
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Remedial Methods: 

1. Attitudinal orientation relating to the 
role of initiative in good habits. 
Individual observations and counseling. 
Rewards and other incentives for efh- 
cient self-starting. 

Encouraging independent reading, hob- 
bies, etc. 


Factor 4. Perseverance. Defined as the set 
of persisting in anything undertaken, par- 
ticularly in the face of difficulties or ob- 
stacles. 

Psychological Analysis: This factor is ap- 
parently related to a group of attitudes re- 
lating to one’s self, ¢e.g., “Nothing is going 
to stop me”; “If at first you don't succeed, 
try, try again.” 

Measurement Methods: 

1. Sample attitudes relating to persever- 
ance. 
Evaluate behavior in the face of ob- 
stacles. Does he give up, or doggedly 
continue to explore alternatives. 

3. Amount of prompting necessary to keep 
at work. 

Remedial Methods: 

1. Ideational orientation to the prceblem of 
how to overcome difficulties. 

Present problem sizuations of gradually 

increasing difficulties until person learns 

what can be accomplished by persistence. 

Do not overwhelm with too difficult 


2. 


tasks at first. 
3. Incentives for difficult tasks persisted in. 


Factor 5. Concentration. Defined as the act 
of focussing attention upon a_ restricted 
range of elements in a total situation, par- 
ticularly the ability to withstand distrac- 
tion. 

Psychological Analysis: 

1. Attention habits. After Burtt(2), it is 
possible to train and improve the atten- 
tion level at which one works by (a) 
place habits involving a re gular and suit- 
able place to work, (b) time habits in- 
volving a regular schedule or time to 
work, and (c) repeated practice in the 
act of focussing attention on elements 
which would normally be in the ground 
of attention. 

2. The “set” of 

naling cues. 

3. Identification and modification of 

tracting elements in the environment. 
Measurement Methods: 

1. Per cent of time paying 
servations of lapses. 
Concentration in the 
trolled distractions. 
Remedial Methods 

1. Training habits of attention. Learning 
place and time hab'ts. Removing need- 
less external distractions. 

Practice in the “set” of 

Plotting a learning 

lapses of attention. 


concentration. Self-sig- 


dis- 


attention, ob- 


? 


presence of con- 


2 concentration. 
curve in avoiding 


Factor 6. Responsibility. Accepts 


obligation 
to be reliable and dependable 


This involves 
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sets and attitudes relating to concept of re- 
sponsible self and relations with others. 

Psychological Analysis: 

1. Reliability or consistency of 
ance. Always tries to do best. 
Willingness to assume responsibility. 
Understands and accepts the necessity 
for assuming obligations, to take on 
work 

3. Dependability. Performance after assum- 

ing responsibility. Being where he 
ought to be, and doing what he ought 
to do. 

4. Judgment in making commitments with- 

in ability. 

Veasurement Methods: 

1. Evaluate consistency of performance. 

Per cent of variability in work output. 

2. Rating scales on punctuality, willingness 

to take on responsibility, judgment in 

making commitments, dependability, ete. 

Remedial Methods: 

1. Educational insistence on reliability and 
dependability. 

Stimulating self-concepts and attitudes 

of responsibility. 

Teach children 

of self. 4 

4. Provide chances to accept responsibility. 


perform- 


» 


» 


to evaluate limitations 


Factor 7. Influence. Defined as the 
others in the work group; 
Psychological Analysis: 
1. Dominance—submission 
tion to group structure. 
2. Acceptance or rejection by other mem- 
bers of group. 
Measurement Methods: 
1. Sociometric analysis 
work. 
2. Analysis of leadership roles. 
3. Evaluation of concepts of self in relation 
to leadership. 
Remedial Methods: 
Training in leadership. Giving 
person opportunities for leadership. 
Personality counseling as a means 
dealing with affective reactions and 
discussing progress. 


effect on 
leadership. 
rela- 


ratios in 


in typical group 


each 
Zz. of 
in 


Factor 8. Concern for Others. 

fluence on other workers. 
Psychological Analysis: 

1. Personality factors predisposing to dif- 


Direction of in- 


ferent patterns of adjustment to others 


work situation. 
The ability to see ourselves 
see us. Self-analysis with 
and perspective. 
\nalysis of specific patterns of construc - 
tive vs. destructive or asocial influence. 
Selfishness vs. altruism. 

Measurement Methods: 

1. Identification of sets, 

tive 


in the 
as others 
detachment 


attitudes or affec- 

reactions relating to others or to 
one’s self which may determine behav- 
ior in the work situation. Projective 
methods, attitude studies, etc. 

2. Evaluation of typical reactions of aggres- 


sion or hostility, i¢c., “going against’ 
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others. Estimate of 
ence on group activity. 
Remedial Methods: 

1. Counseling and psychotherapy. 

2. Educational methods intended to make 
children sensitive to group attitudes ; 
education for citizenry in a democracy. 

3. Other factors contributing to mental 
health. 


destructive influ- 


Factor 9. 
stantly 
improve 
receptive, 
constructive 
without. 

Psychological Analysis: 

Contact with, and acceptance of, Reality. 

Constructive and objective rather than 

destructive and harsh criticism. 

3. Acceptance of criticism without negative 

emotional involvement. 

Measurement Methods: 

1. Comparison between self-ratings and ex- 
ternal evaluations. Rating scales. 
Identification of rationalizations and 
other projective devices to escape critical 
Reality. 

Remedial Methods: 

1. Where criticism is necessary, indicate 
that it refers only to a limited aspect of 
behavior with the personality as a whole 
being accepted and respected. 
Educational orientation with impersonal 
discussion of typical situations: where 
failure of self-criticism results mal- 
adjustment. 

3. Training in attitudes of self-criticism. 
Projects in exe reising criticism. De- 
velop standards for applying self-criti 

cism. 

4. Coanseliine and psychotherapy directed 
toward removing emotional blocks to 
self-criticism. 


Self-Criticism. The “set” of con- 
looking for changes which would 
one’s self. This also includes a 
non-defensive attitude toward 
direction or criticism from 


>? 


? 


? 


Factor 10. Emotional Stability in Work. This 
refers specifically to affective reactions aris- 
ing primarily in the working situation. Al- 
though many emotional states reflect a dis- 
order of the whole personality, it also 
postulated that emotional tensions may be 
specific to the work situation. 

Psychological Analysis: 

1. Ability to resist disintegrating influences 
of frustration in work, annoyance that 
produces petulant ineffective behavior, 
depression or rage reactions. 

Affective states involving anxiety and 
worry which distract attention and de- 
stroy concentration. 

3. Neurotic reactions to work. 

Measurement Methods: 

1. Incidence of temper tantrums, rage 
actions, hostility or aggressiveness, 
structiveness, etc., when frustrated. 
Formal analysis of patterns of affective 
reactions. 

a. Withdrawal or 
Flight vs. fight. 
b. Paranoid reactions. 


is 


re- 
de- 


escape reactions 
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c. Cycloid affective reactions. 
d. Psychosomatic reactions. 
e. Compensatory reactions. 

Remedial Methods: 

1. Training in accepting and controlling 
affective reactions. 

a. Emotions are intensified by somatic 
reactions, hence relax or assume the 
bodily response for the opposite emo- 
tion. 

b. Delay action. 

c. Conditioning methods. 

Personality counseling and therapy. Re- 

moving emotional blocks to learning and 

self-realization. Insight. 


? 


“actor 11. Budgeting Time. 
available time. 
Psychological Analysis: 
1. Ascertains amount of time available. 
2. Lists all activities to be done. Estimates 
time schedule. 
3. Evaluates importance of the job. How 
much time is it worth to spend on it? 
4. Evaluates the job to discover what 
major tasks will be. Assigns priorities. 
Has tools ready to begin. 
5. Apportions time realistically, doing: first 
things first. 
Measurement Methods: 
1. Rating scales evaluating 
a. Ability to analyse task to set up 
work schedule. 
b. Quality of work schedules. 
c. Degree of adherence to schedule. 
2. Analysis of time investment, time wast- 
ed, etc. 
Remedial Methods: 
1. General orientation to fact that time is 
money. 
2. Training procedures, making time budg- 
ets, etc. 
3. Use of incentives, such as payment by 
piece work, etc. 
4. Training in having tools ready, arrang- 
ing good mechanical conditions. 


Optimum use of 


actor 12. Following Directions. The atti- 
tude of seeking to understand and carry out 
instructions. 
Psychological Analysis: 
Ability to follow directions or instruc- 
tion from authority without negative 
affective reactions. 
“Set” of making effort to understand 
what is wanted, and not acting until is 
satisfied with unde rstanding. 

3. In general, “conformism” as an attitude. 
Measurement Methods: 

1. Per cent of time that person follows 
directions. 

Measurement of comprehension to dis 
cover span of comprehension; directions 
tests. 

Remedial Methods: 

1. Psychological study of persons who are 
markedly negativistic or non-conformist. 
Practice at graded levels on directions 
tests. 


2. 


> 


Factor 14. Use of Research Sources. 


3. Teacher analysis of how to present ma- 
terial so that it can be easily compre- 
hended. Start small and add to it. 


Factor 13. Seeks Necessary Advice. Compre- 


hends limitations of knowledge and emo- 
tionally able to learn from others. 
Psychological Analysis: 
The attitude of consciously trying to 
improve efficiency by benefiting trom 
the experience of others. Willing and 
able to learn from others. 
2. Seeks information concerning possible 
solutions before starting work. 
3. When blocked, seeks advice as to how 
to proceed. 
4. Positive emotional attitude toward au- 
thority. 
Measurement Methods: 
1. Rating scale of number of times advice 
sought. 
2. Evaluation of advice-seeking efforts to 
determine how efficiently done. 
3. How does child evaluate advice ? 
Remedial Methods: 
1. Advice should be offered in friendly non- 
critical manner. 
2. Incentives to those who ask and utilize 
advice properly. 
3. Psychological evaluation for those with 
emotional blocks. 


Ability 
to locate information. 

Psychological Analysis: 

Knowledge of the mechanics of using 

special sources such as libraries, diction- 

aries, books, graphic and tabular mate- 

rials, statistics, etc. Use of people as 

sources of information. 

Completeness of research investigation. 

Skill of research investigation. 

Measurement Methods: 

Many formal-tests are available for evalu- 
ation of use of dictionary, index, library 
skills, use of graphic materials, etc. See 
Buros() 

Remedial Methods: 

Training in the use and construction of 

book % 

a. Evaluation of qualifications of au- 
thor, date of publication, book re- 
views. 

b. Familiarizing with contents. Look- 
ing at chapter titles, preface, bold 
print and other indications of im- 
portant material, summary and con- 
clusions. 

c. Reading skill. This dealt with ex- 
tensively in many sources. 

d. Use of index, bibliography, tabular 
and graphic materials. 

Use of libraries. 

Knowledge of physical layout. Use 
of special facilities, browsing rooms, 
etc. 

b. Use of indexes, card catalogues, ete. 

c. Knowledge of source materials, se- 
lection of most likely source, gather- 
ering all relevant material, etc 


>? 
2 
0. 











THORNE, L. W. 


2 


3. Use of special research tools. Diction- 
aries, tabular and graphic materials. 
4. Training in alphabetizing. 


Material. Under- 
thinking relating to 


Factor 15. 
standing 
data. 

Psychological Analysis: 
1. Interpretation of data with special re- 
gard to validity. 

a. The “set” 

everything. 

b. Semantic evaluation. Understanding 

the sources of error inherent in com- 

munication difficulties. 

Recognizing special interests, ulterior 

motives, propaganda, etc. 

Crude errors in interpretation 

ic). 

Generalizing beyond the data. 
‘Ability to get the central thought, ie., 
to differentiate between basic premises 
and corrolaries. 

3. Ability to outline material. 
4. Understanding of the laws of probabil- 
ity, statistical and mathematical analysis. 
Measurement Methods: 

Forn,al tests are currently available for 
interpretation of data‘), study skills“), 
ability to recognize propaganda, etc. Gen- 
eral expressive behavior may also be utilized 
in evaluation. 

Remedial Methods: 
1. General education. It has been said that 

a man's education may be evaluated by 

the quality of his intelligent guesses. 

2. Training in logic, semantics, mathe- 
matics of probability, statistics and the 
scientific method. 


Organisation of 
and critical 


of critically evaluating 


(log- 


Facror 16. Accuracy. General carefulness. 

Psyc hological . fnaly sis: 

1. Individual understanding of the concept 
of the personal equation, i.c., understand- 
ing of the fact that mistakes are inevita- 
ble. 

‘Accuracy sets.” The attitude of seek- 
ing to check on the correctness of work 
and its completeness. 

3. Knowledge of the methods 
checking is accomplished. 

Measurement Methods: 

1. Analysis of errors of 
acteristic of each person. 

2. Consistency of checking each detail of 

work including completeness. 

3. Follow-up to check performance 
training procedures. 
Remedial Methods: 

1. General orientation relating to the na- 
ture of “accuracy sets. 

2. Training in the methods of checking 

accuracy. The following methods are 

suggested for specific instruction : 

a. Checking correctness of transcribing 
and understanding problem; check 
copying. 

Repeat by same method to see 
sults identical 


whereby 


execution char- 


after 


if re- 
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Repeat with alternative methods to 
check answers. 

Make estimate of answer to roughly 
check correctness. 

Repetition by another person. 
Checking with external standards. 
Pragmatic tests, as in roughly esti- 
mating before sawing a board. 
Repeatedly checking back to basic 
problem or hypothesis to make cer- 
tain the problem has not changed. 


The first ten factors deal primarily 
with broad personality traits or attitudes 
which are considered essential for good 
work habits. Major emphasis is placed 
on general ideological orientation to work, 
self-regulation and control, emotional sta- 
bility and interpersonal relations. It is 
intended that the person will assimilate 
the different attitudes and “sets” into his 
conception of himself in the role of the 
worker. An effort has been made to 
interpret the importance of these attitudes 
and “sets,” and to explain how they are 
to be accomplished. It is intended that 
the person will verbalize the processes to 
himself, developing effective self-start- 
ing verbal cues. The second six factors 
deal with the actual mechanics of work 
and particularly study habits. 


CLINICAL APPLICATIONS 


Ideally, good work habits are acquired 
as a primary result of an efficient educa- 
tional process and automatically operate 
as a prime preventative of personality 


maladjustments. Clinical experience, 
however, indicates that it is frequently 
left almost to chance whether or not any 
particular young person receives the in- 
tensive training and supervision neces- 
sary for the learning of good work habits. 
Current educational methods provide no 
formal opportunities for the intensive 
evaluation of the work habits of each 
child with particular reference to coun- 
seling and remedial work. Special atten- 
tion may be given to the “problem” child 
in enlightened school systems or where 
the family can afford special tutoring, but 
too often the primary emphasis is upon 
securing good marks rather than upon 
remedial training. Fortunately, most 
pupils acquire sufficiently good work 
habits by themselves to be able to live up 
to average standards of productivity but 
a significantly large group become casu- 
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alties of the educational system as they 
are unable to acquire these skills under 
the standard “convoy” system. 

It is not the purpose of this paper to 
describe the morbid personality reactions 
and life maladjustments which result 
from the failure to work effectively. 
There is urgent need of detailed situation- 
al analysis to describe the psychology of 
the person who has never learned to work. 
Indeed, with the popularity of progressive 
education and the psychoanalytic orienta- 
tion to personality development, such fac- 
tors as situational adjustment to work 
have been largely ignored in the search 
for “depth” factors in adjustment. It is 
our contention that next to getting along 
with people, marriage and sex, adjust- 
ment to work constitutes the major prob- 
lem of life. Effective work habits are 
entirely learned, and as such, cannot be 
taken for granted in the absence of inten- 
sive training opportunities. 


Philosophical Rationale. Healthy atti- 
tudes toward work (Factor 1, Serious- 
ness of Purpose) must ultimately depend 
upon the role of work in the value sys- 
tems of the person. Faced realistically, 
many work situations are basically un- 
pleasant because of (a) the innate un- 
pleasantness of the work, (b) regimenta- 
tion, monotony and boredom, (c) lack of 
feelings of creativeness or productive- 
ness, (d) the surrender of personal free- 
dom and opportunity to do more pleasant 
things, and (e) many other factors char- 
acterized by negative valence. While 
certain genetic strains (e.g., working ani- 
mals) appear to have innate predisposi- 
tions to work and require little external 
motivation, it is equally true that large 
numbers of animals and humans will not 
work except under conditions of external 
compulsion. Threatened by starvation 
unless they work, large segments of the 
population work only just sufficiently to 
satisfy basic needs. Thus it is found 
necessary to devise many types of re- 
ward and punishment to keep school chil- 
dren and factory workers at their tasks. 
While incentives such as monetary re- 
wards or prestige operate as powerful 
motives to work, it appears that healthy 
attitudes toward work are the basic 
foundation of good work habits. 
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Even the most superficial sociologic ob- 
servations reveal marked regional differ- 
ences in attitudes toward work. In New 
England, thrift and industriousness are 
valued highly and’ are importantly em- 
phasized in child training. To be lazy 
or untidy is to risk social ostracism. The 
result is that these traits are prominently 
found in all classes including the lowest 
economic strata with few exceptions. 
Personnel supervisors in the industrial 
centers during World War II actively 
sought workers from Maine, New Hamp- 
shire and Vermont because of the rela- 
tively healthy attitudes toward work 
shown by these people. In contrast, the 
lower economic classes of the South fre- 
quently display culturally different atti- 
tudes toward work. Viewing themselves 
as socially higher than Negroes, they feel 
themselves too “good” to work. The re- 
sult is that there arises a shiftless, irre- 
sponsible, lazy class known as “poor 
white trash” whose actual sociological 
position may frequently be inferior to 
that of the Negro who is ready and will- 
ing to work. 

The situation is still further compli- 
cated by a welter of superstitious and er- 
roneous attitudes®: 5) such as the follow- 
ing : 

“'m not going to kill myself working for 
anybody. They (the capitalists) expect you to 
work yourself to the bone, and then what do you 
get? Nothing.” 

“T wouldn't get in a work rut. It’s unhealthy.” 

“T don’t want my children to have to work 
the way I did.” 

“Only the lower classes work. Ladies and 
gentlemen shouldn't soil their hands.” 
“He’s dressed like a working man. 

away from them.” 

“I’m not going to take orders from anybody. 
Who does he think he is, anyway ?” 

“You couldn't pay me enough to do that kind 
of work.” 


Let's stay 


The prevalence of such unhealthy aiti- 
tudes toward work renders it desirable to 
evaluate the attitudes of each person, and 
to institute appropriate educational meth- 
ods to instill more tenable philosophical 
conceptions of the values of work in life. 
In the presence of unhealthy core atti- 
tudes, it is not to be expected that other 
work skills will be highly developed. 

The basic objective is to secure indi- 
vidual acceptance of the idea that to 
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work is good and that each person has 
the ability to improve his basic work 
skills. To this end, it is suggested that a 
coordinated orientation program be or- 
ganized to begin a graduated program of 
training in kindergarten which may be 
carried through to the highest levels of 
graduate study. The emphasis in such a 
program should not be on any grading or 
marking system, since this would only 
serve to stimulate defensive reactions, but 
rather to encourage each person to im- 
prove his own efficiency in a noncompeti- 
tive situation. The general orientation 
may be accomplished through lecture and 
other materials which attempt to estab- 
lish the rationale of the program in the 
child’s mind. Various direct and indirect 
methods may be used to analyse each 
child’s work habits, to provide explana- 
tions of what is being done wrong, to 
provide periodic checks so that the per- 
son may estimate his own progress, and 
to provide pleasurable work situations so 
that the child gradually is habituated to 
tasks of increasing difficulty. 


Improving Work Efficiency. Although 
objective methods of evaluating over-all 
work efficiency are not available currently, 
the assumption that such efficiency is dis- 
tributed in the population according to 
normal probability curves would mean 
that the average person rarely averages 
more than 50% of what he is capable of. 
Current industrial and labor practices are 
so adjusted that individuals are rarely 
stimulated to maximum production levels 
except in piece-work operations or other 
incentive systems. The person who 
achieves 80 or 90% efficiency is so rare 
that such accomplishment automatically 
assures material success in our society. 
Indeed, such labor practices as feather- 
bedding and other social restrictions tend 
to operate to reduce initiative and to dis- 
criminate against the efficient worker who 
may find himself an outcast. The suc- 
cess of the American system is a reflec- 
tion of the degree to which efficient work 
processes both individually and industri- 
ally have made possible hitherto unknown 
material accomplishments. While medioc- 
rity may be carried along in American 
culture on the achievements of the doers, 
the young person who expects to achieve 
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any unusual degree of success must be 
prepared to train himself to such perfec- 
tion of performance as to make effective 
competition possible. 

The wide dissemination of such atti- 
tudes in education should result in stimu- 
lating younger generations to make use 
of the counseling facilities which are now 
being made available. Having gained 
comprehension of the concept of improv- 
ing work efficiency, it is to be expected 
that the desired mental sets will be more 
universally acquired among all young peo- 
ple. The best worker will receive con- 
siderable prestige and _ self-satisfaction 
for his performance. 


The Habit of Efficient Work. Although 
efficient work habits may individually re- 
quire varying periods of training run- 
ning into decades for their accomplish- 
ment, once acquired they become habitual 
and serve as the firm bedrock upon which 
healthy personality may develop. Prob- 
ably good work habits contribute more 
dependably to happiness and personality 
growth than any other activity in which 
a person may indulge. Far into old age, 
and after other sources of pleasure have 
been outgrown, the habit of working 
steadily creates productivity and_ self- 
realization which can be depended upon 
to make life bearable after all other sup- 
ports have been lost. Work therefore 
becomes something which is to be en- 
gaged in vigorously as long as strength 
permits. The objective, then, is not to 
seek a life of leisure but of creativeness ; 
not to find relaxation in idleness, but in 
a change of work; not to seek happiness 
in old age in retirement from work, but 
to keep busy as long as life lasts. This 
is not intended to minimize the values of 
other worthwhile activities such as recrea- 
tion or social activities, but simply to 
suggest that healthy work habits are per- 
haps the most valuable accomplishment of 
all. 


SUMMARY 


The relation of good work habits to 
personality adjustment is discussed in de- 
tail. Conversely, the etiologic role of 
poor work habits in neurotic maladjust- 
ment is outlined. It is postulated that 
efficient work habits are not instinctive 
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but can only be obtained through inten- 
sive training procedures according to the 
psychology of learning. As the result of 
intensive theoretical discussions in an 
educational workshop devoted to this 
subject, a list of 16 factors contributing 
to efficient work habits is outlined to- 
gether with psychological analysis of each 
factor and suggestions for objective meas- 
urement. Clinical applications of the gen- 
eral concept are discussed. 
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CLrnicaL Psycnotocy 1n 1950 


Now at the beginning of the second 
half of the 20th century it may be truth- 
fully stated that clinical psychology has 
passed through adolescence and entered a 
vigorous young adulthood. In 1945 when 
this Journal began publication, the field 
was experiencing war-stimulated grow- 
ing pains and the directions of future 
development were still unclear. The fu- 
ture looked hopeful but a realization of 
its potentialities had yet to be accom- 
plished. The last five years have indeed 
witnessed remarkable achievements in the 
organizational and promotional aspects of 
the field. Through the cooperation of the 
Veterans Administration and the Ameri- 
can Psychological Association, the devel- 
opment of adequate training facilities in 
clinical psychology has been stimulated, 
given intelligent direction, and evaluated 
through periodic appraisal of approved 
training facilities. Under the auspices of 
the American Psychological Association, 
there has been established an American 
Board of Examiners in Professional Psy- 
chology which has already made an im- 
pressive beginning in accomplishing the 
certification of competent personnel. 
Progress has also been made in the diffi- 
cult task of perfecting selection proce- 
dures for accepting the most promising 
students in the limited graduate training 
programs. State and local recognition of 


clinical psychology has occurred in sev- 
eral areas of the nation with the passage 
of laws governing professional licensure 
and practice. To be sure there have been 
some vigorous controversies on the ques- 
tion of who shall be considered a clinical 
psychologist, as in Pennsylvania where 
a State licensure law was recently defeat- 
ed mainly through the efforts of an in- 
surgent group of non-Ph.D. psychologists 
who resented the efforts of the leader- 
ship of the state organization to legislate 
them out of practice. Such public dis- 
plays of lack of harmony within the pro- 
fession are unfortunate in the sense that 
they might have been prevented through 
resolution of disagreements before any 
attempt was publicly made to secure pas- 
sage of the legislation. There are some 
members of our profession who resent 
and would rigorously renress the insur- 
gency of non-doctorate psychologists, but 
such opposition is perk « both desirable 
and healthy in achieving a practical reso- 
lution of the various interests involved. 


y 


While granting every possible oppor- 
tunity for allowing the new divisional or- 
ganization of the American Psychologi- 
cal Association to demonstrate its effec- 
tiveness in adequately representing the 
interests of clinical psychology, there ap- 
pears to be some legitimate doubt con- 
cerning its ultimate efficiency. On pro- 
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grain-planning levels, the divisional plan 
of organization provides ample opportu- 
nity for representation of all interests. On 
organizational levels, there has been some 
confusion and inefficiency related to the 
large number of divisions with partially 
overlapping interests and functions. In 
relation to the numbers of members in- 
volved, and the importance of the prac- 
tical issues to be dealt with, the Division 
of Clinical and Abnormal Psychology 
finds itself much in the position of the 
tail wagging the dog. Perhaps of the 
greatest importance is the problem of se- 
curing adequate financial support and in- 
dependency of action where clinical psy- 
chologists are represented by a Division 
in the parent organization rather than by 
an independent organization. The activ- 
ities of the Division of Clinical Psychol- 
ogy have been seriously impaired by the 
fact that its share of members’ dues is 
relatively inconsequential in comparison 
with the share of the American Psycho- 
logical Association. The situation ap- 
pears to be quite comparable to the rela- 
tion of the American Medical Associa- 
tion to the American Psychiatric Asso- 
ciation. The American Medical Associa- 
tion has its Division of Neurology and 
Psychiatry which operates primarily on 
program-planning levels at national con- 
ventions, while the function of represent- 
ing the interests of psychiatry as a spe- 
cialty lies with the American Psychiatric 
Association. It is still too early to pass 
judgment on the present organization of 
the American Psychological Association 
but it appears that the burden of proof 
rests on those who claim that the interests 
of clinical psychology can be adequately 
represented under the present plan. As 
the numerical representation of clinical 
psychologists increases, this problem will 
become increasingly acute as financial 
needs become progressively greater. 


7 


Although there still remains some resi- 
due of suspicion and sensitivity between 
individual psychiatrists and clinical psy- 
chiatrists, the relations between the two 
professions have become steadily closer 
and improved. The establishment of offi- 


cial committees representing each organi- 
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zation provides a mechanism for the har- 
monious resolution of controversial is- 
sues. It is to be anticipated that clinical 
psychology will also explore its interpro- 
fessional relations with education and 
other neighboring fields in which applied 
psychology is being increasingly utilized. 
It is significant that the 1949 report of 
the APA Committee on Training in ,Clini- 
cal Psychology contains the statement 
that private practice by single, independ- 
ent psychologists offers much less value 
either to the client or to the psychologist 
than does the team or group approach in 
association with competent members of 
other professions. In the past, many 
clinical psychologists have expressed the 
desire to afhliate themselves in group 
practice with other specialists (notably 
psychiatrists) but only a few have been 
able to find congenial connections. A 
profitable venture of the future might be 
to secure a list of psychiatrists and other 
neighboring specialists who would be 
willing to enter into closer relations with 
accredited clinical psychologists. It is 
gratifying to note that many of the more 
progressive psychiatrists have made pub- 
lic statements that they would not con- 
sider working without the cooperation of 
clinical psychologists. 
. 

In our opinion, the most important is- 
sue confronting clinical psychology is the 
validation of its own theoretical founda- 
tions and its practical techniques. Our 
most urgent need is for a rigorous and 
unceasing critical evaluation of all our 
concepts going back to basic funda- 
mentals. Clinical psychology appears to 
be reproducing the evolutionary pattern 
of clinical medicine in which clinical prac- 
tice considerably antedated basic science 
with the result that there accumulated a 
great mass of theories and intuitive meth- 
ods. With the advent of basic science in 
medicine, it became necessary to reevalu- 
ate the theories and practice of the pre- 
scientific era in order to differentiate be- 
tween fact and superstition. Medicine 
did not earn the right to be considered sci- 
entific until this validation process had 
been largely accomplished through the 
elevation of the basic sciences to their 
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proper primary position as the only tena- 
ble foundation of clinical practice. 
Without being unduly pessimistic or 
hypercritical, it is our opinion that clini- 
cal psychology should immediately em- 
bark upon a comprehensive reexamina- 
tion and validation of all its basic the- 
ories and practices because of the fact 
that few of them have been adequately 
validated. As has been previously noted, 
clinical psychology has had a_ rather 
unique developmental pattern in that 
much of its subject matter has been req- 
uisitioned from neighboring disciplines. 
Thus, much of current thinking in clinical 
psychology has been appropriated from 
psychiatry and psychoanalysis. Without 
disparaging the accomplishments of psy- 
chiatry and psychoanalysis, the fact is 
that many of their basic theories have 
been derived intuitively or empirically 
and have not been scientifically validated. 
The confused state of current theory and 
practice may be illustrated by examples 
which impressively demonstrate the il- 
logical and invalid nature of methods 
based on untenable premises. Since diag- 


nostic classifications and methods are gen- 
erally agreed to be the cornerstone of the 
basic science of psychopathology, we may 


briefly comment upon them first. First is 
the significant fact that outside of scanty 
information concerning anatomical locali- 
zation, we know relatively nothing about 
brain function. Scientific psychology has 
described certain behaviorial manifesta- 
tions of central nervous system function 
such as sensing, perceiving, remembering, 
learning, etc., but little is known concern- 
ing how these are mediated. Second, with 
the exception of the organic psychoses 
and certain psychosomatic disorders based 
on known anatomical or physiological 
pathology, our psychopathological and 
diagnostic conceptions of the functional 
disorders are elementary and inadequate. 
Although we glibly attempt to classify 
cases as schizophrenia, obsessive-compul- 
sive neurosis, or what have you, the ex- 
perienced clinician constantly recognizes 
the invalidity of such procedures. What 
is schizophrenia? Is it a genuine entity, 
or a diagnostic wastebasket into which 
we classify cases for which we can find 
no better name? The increasing realiza- 
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tion of the frequency with which severe 
anxiety neuroses or manic-depressive 
states verge into schizophreniform proc- 
esses tends to make us doubtful about our 
classifications. The more experience we 
acquire, the higher is the incidence of 
cases in which we are unable to make 
one of the traditional diagnoses and in 
which we recognize processes which are 
inadequately explained by current  the- 
ories. Until the pathological mechanisms 
of the functional disorders are revealed, 
we must recognize that all our proposi- 
tions are merely assumptions to be ac- 
cepted only tentatively. In view of the 
small amount which is actually known of 
brain function or its psychopathology, one 
is entitled to wonder at the positivity and 
uncriticality of the majority of present- 
day clinicians in all the psychological 
fields. Thirdly, we see cause for serious 
criticism of those who place so much 
emphasis on theories which are too often 
based on limited observations of intuitive- 
empiric type. Thus, we are living in an 
era which has uncritically swallowed the 
teachings of psychoanalysis and related 
schools to the point where unconscious 
motivations are read into all behavior. 
On all sides we hear glib references to 
Freudian mechanisms, orthopsychiatric 
concepts, etc., whereby clinicians project 
esoteric meanings onto behavior. One 
has only to encounter a few cases which 
have experienced almost every known 
type of trauma and displayed all the 
known pathological (?) mechanisms but 
are still normal to begin to question such 
slavish obeisance to current fads. Again, 
this is not to discredit the acknowledged 
contributions of our neighboring disci- 
plines, but simply to urge a much more 
critical attitude in their application. 
Similar comments would appear to apply 
to those clinicians whose thinking is per- 
haps over-conditioned by “projective” 
concepts. Fourth, a further source of 
diagnostic error may be found in the 
theoretical confusion attendant upon the 
fact that there are so many breeds of 
psychologists. Psychology is perhaps 
the only “science” in which there is not 
universal acceptance of the eclectic prin- 
ciple that the limits of the subject matter 
of the field are determined primarily by 
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what has been objectively demonstrated 
rather that by theoretical adherences. 
Medical teaching and practice over the 
world is remarkably standardized by the 
fact that only facts are given primary 
weight with theories being legibly brand- 
ed only as hypotheses which may or may 
not have any definitive value. The house 
of psychology is still a Babel of confus- 
ing theoretical positions. Although in- 
dividual psychologists pride themselves 
on their scientific sophistication, the state 
of the field as a whole does not reflect any 
wholesale translation of objectivity into 
practice. Fifth, diagnostic methods ap- 
pear to have suffered from excessive pre- 
occupation with laboratory methods and 
psychometrics to the neglect of the inten- 
sive training of the clinical judgment of 
the observer. It is time to insist that all 
such methods are only tools which can 
never be more yalid than the validity of 
the theory and practice upon which they 
are based. Published research reports 
are beginning to reveal what the experi- 
enced clinician could have predicted from 
the beginning, namely that such devices 
as the MMPI, scatter analysis of Wech- 
sler-Bellevue results, projective methods, 
etc., cannot be depended upon to be more 
valid results than the inherent validity of 
the theoretical approach and technical ex- 
cellence of the person who administers 
them. The uncritical use of “diagnostic” 
signs has led many clinicians into errors 
of diagnosis and prediction which could 
have been prevented by detailed knowl- 
edge of the limitations of the methods in 
question. 

Similar comments would appear to ap- 
ply to methods of therapy which are not 
founded upon what has been objectively 
demonstrated in scientific psychology. 
Although a good deal of admirable work 
has been accomplished in describing what 
takes place in various methods of ther- 
apy, explanations of the rationale of most 
methods are still very unconvincing. The 
literature of the field is still replete with 
references to such mystic or teleological 
concepts as “growth principles,” “‘uncon- 
mechanisms, “tele,” “strong” or 
“weak” Egos, ete. We can only conclude 
that theoretical concepts which do not 
conform to the established facts of ob- 


scious” 
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jective psychology cannot be accepted 
into the realm of the scientific. Some of 
the worst offenders against this principle 
are the psychiatrists and psychoanalysts 
who profess no immediate knowledge of 
what has taken place in modern psychol- 
ogy. We cannot be too uncompromising 
in unfrocking those who profess to be 
“scientific” clinicians in the absence of 
any broad and comprehensive understand- 
ing and application of what modern psy- 
chological science is. Another source of 
unsatisfactory therapeutic work is the 
fact that too many therapists are play- 
ing one-stringed violins. Too many thera- 
pists are known for their work with single 
methods such as psychoanalysis, hypo- 
analysis, psychodrama, non-directivism, 
etc. There is little evidence that current 
clinical training programs offer compre- 
hensive theoretical and practical exercises 
with all known methods. Instead, it is 
left more or less to chance whether any 
particular student acquires any broad ex- 
perience with all methods and all clinical 
types. The future in this respect will not 
be particularly encouraging until some 
of our most prominent leaders espouse 
eclecticism rather than cultism. 


7 


What is to be done to quickly accom- 
plish our ideal of confining ourselves 
rigidly only to concepts and_ practices 


which have scientific validity? The first 
step would appear to be the comprehen- 
sive development of basic science. We 
need to go back and reevaluate our clini- 
cal concepts in terms of the bedrock of 
objective psychology. We need to re- 
evaluate clinical observations in terms 
of known abnormalties in sensing, per- 
ceiving, remembering, learning, thinking, 
etc. We need to return to the biological 
orientation, remembering that man is first 
a biological creature. Clinical practice 
can only be as valid as the basic science 
upon which it is based. This means that 
prime weight must constantly be placed 
on developing basic science as rapidly as 
facilities permit, and to insist that every 
clinician will undergo a continuous train- 
ing process designed to keep him pro- 
gressively acquainted with recent devel- 
opments. The initial honeymoon and 
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excitement attendant upon the pioneer 
days of clinical psychology are over ; from 
now on, only hard grinding at basic sci- 
ence can produce dividends. 

Second, we need to assign to matters of 
diagnosis the primary importance which 
they deserve. In the absence of valid 
diagnosis, we do not even know what we 
are dealing with. Once we discover the 
etiologic factor in a disorder, a plan of 
therapy suggests itself immediately, 1.e., 
diagnosis is the crux of the problem. The 
diagnostic systems of the past are grossly 
inadequate and should be immediately re- 
worked. Attention should be given not 
only to reevaluating old classifications, 
but in filling in large diagnostic gaps 
which now exist. The diagnosis of the 
past has been a function of institutional 
psychiatry which has dealt with relatively 
malignant conditions. The psychiatrist 
of the past never came in contact with 
many types of mild cases which are now 
recognized. There is great need for an 
amalgamation of the clinical experience 
of the psychiatrist and psychoanalyst 


with the theory and methodology of the 


scientific psychologist. The tendency in 
some circles to bypass diagnosis on the 
basis that it is not necessary for therapy 
cannot be too strongly castigated as be- 
ing both illogical and unscientific. 
Thirdly, we need to achieve a realistic 
working balance between the use of clini- 
cal judgment and objective measurement 
in all diagnostic and therapeutic func- 
tions. A clinical psychologist should not 
be considered to have completed his 
training until he has acquired reasonable 
clinical judgment and demonstrated the 
ability to use it effectively. In general, 
this means that he must have developed 
his clinical judgment to the point where 
he can reach reliable and valid judgments 
in the absence of psychometric instru- 
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ments or laboratory procedures. The 
situation is: quite comparable to that of 
the general practitioner of medicine who 
must reach a diagnosis and treat the pa- 
tient in the absence of medical center 
methods. In medical education, there is 
strong insistence that the student must 
develop his own clinical judgment as a 
primary instrument to be checked by but 
not replaced with laboratory procedures. 
The clinical psychologist must not be 
afraid to use his new-found clinical judg- 
ment, particularly in situations where it 
does not agree with psychometric or 
other test results. In some respects, psy- 
chology is still in the brass-instrument 
stage in which there is exaggerated re- 
spect for laboratory methods or statisti- 
cal results. These have their place but 
should never supersede clinical judgment. 

There is increasing evidence that the 
psychologist or psychiatrist of the future 
must be a master of all related specialties 
in order to gain any organismic perspec- 
tive of the whole person in his behavioral 
field. The psychological scientist must 
be an integrator, able to weigh data from 
every significant source. Thus far, clini- 
cal judgment is the only tool for integrat- 
ing the whole data. In many respects, 
the clinical psychology of the past has 
lacked the all-a-round balance which it 
will need in the future. Although less 
well developed and not booming as it is 
in the United States, there is evidence 
that European psychology and psychiatry 
are in some respects closer to basic funda- 
mentals. We might mention the many 
genetic and constitutional studies appear- 
ing in the Scandinavian Acta Psychia- 
trica et Neurologica, English neurology 
and statistical studies, or the prewar Ger- 
man studies of training methods as illus- 
trations of trends which need to be fur- 


ther developed in America. PCT. 
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During the Summer of 1950, a series of Vacation Workshops in Clinical 
Psychology will be held at Grand Isle, Lake Champlain, Vermont. These 
Workshops are designed to provide intensive explorations in areas of Theory 
and Practice which have not been extensively covered in the literature. 
The Workshops will be under the direction of Frederick C. Thorne, M.D., 
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I. DIAGNOSTIC PROBLEMS IN COUNSELING AND PSYCHOTHERAPY. 
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tered in the standard categories of disorders. Special emphasis will be placed 
on the indications for utilizing the various methods of therapy. 
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from Monday through Saturday. There will be approximately 25 hours of intensive work and 
discussion in each Workshop. The afternoons will be left free for rest or recreation. 


DATES. Final dates will depend upon convenience of registrants. Tentative dates are: Course 1, 
June 19-24, July 17-22 or August 21-26. Course II, June 26-July 1, July 24-29 or August 28- 
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the Master’s degree. It is expected that each registrant will have had sufficient clinical experience 
to be able to contribute to the Workshop. Only a limited number will be accepted at each 


Workshop. 


FEES. A fee of $25.00 will be charged for each Workshop, payable in advance. 
insure a reservation. Registrants should apply early. 


ACCOMMODATIONS. Several high quality hotels, boarding houses and tourist camps are adjacent 
to the site of the Workshop. Room and board may be obtained at from $5.00 per day up. The 
Workshop will assume responsibility for locating accommodations for everyone. 
type desired. 
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